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% 

It is to be regretted that many people have a tendency to accept sta- 
tistical data without question. To them, any statement which is pre- 
sented in numerical terms is correct and its authenticity is automatically 
established. Shortly after the retirement of a clerical employee of a rail- 
road, it was announced in the press that during his 43 years of employment 
he had commuted a total of 1,200,000,000 miles. Most /eaders of the 
statement probably accepted it \vithout question. As a matter of fact, in 
order for Ihe figure to be correct the employee would have had to travel 
approximately 3,200 miles each and every hour of every day during the 
entire 43 years! 

Presentation. Either for one’s own use or for the use of others, the 
data must be presented in some suitable form. Usually the figures are 
arranged in tables or represented by graphic devices as described in Chap- 
ters 3 to 6. 

Analysis. In the process of analysis, data must be classified into 
useful .‘H logical categories. The possible categories must be considered 
when plans are made for collecting the data, and the data must be classi- 
fied as they are tabulated and before they can be shown graphically. 
'J'hus the process of analysis is partially concurrent with collection and 
presentation. 

There are four important bases of classification of statistical data: (1) 
(lualitative, (2) quantitative, (3) chronological, and (4) geographical, each 
of which will be examined in turn. 

Qualitative. When, for example, employees are classified as union or 
non-union, we have a qualitative differentiation The distinction is one 
of kind rather than of amount. Individuals may bf. classified concerning 
marital status, as igle, married, widowed, divorced, and separated. 
Farm operators may be classified as full owners part owners, managers, 
and tenants. Natural rubber may be designated as plantation or wild, 
according to its source. 

Quantitative. W^hen items vary in respect to some measurable charac- 
teristic, a quantitative classification is appropriate. Families may be 
classified according to the number of children. Manufacturing concerns 
may be classified according to the number of workers employed, and also 
according to the value of goods produced. Individuals may be classified 
according to the amount of income tax paid. 

Most quantitative distributions are frequency distributions. The data 
of Table 8.3 show a frequency distribution of grades received by the 1952 
graduating class of the United States Merchant Marine Academy. A 
number of other frequency distributions are shown in Chapters 8, 9, and 
10 . 

Sometimes, qualitatively classified data may be reclassified on a quanti- 
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tative basis by making very slight changes, 'the assets of a bank may be 
listed in respect to degree of liquidity (cash, due from banks, United States 
securities, marketable securities, call loans, eligible paper, other loans, real 
estate loans, real estate, and furniture and fixtures). Although these 
categories differ from one another in a more or less unassignable quanti- 
tative fashiod, the classification is actually made upon a qualitative basis. 
If we should reclassify the bank assets according to the length of time 
required to convert each into cash, the classification would be quantita- 
tive. In general the assets would be in the same order as before, but a few 
specific items among the less liquid qualitative groups (for example, cer- 
tain real estate and real estate loans) would be convertible into cash in a 
relatively short time. 

Chronologiccd, Chronological data or time series show figures concern- 
ing a particular phenomenon at various specified times. For example, the 
closing price of a certain stock may be shown for each day over a period 
of months or years; the birth rate in the United States may be listed for 
each of a number of years; production of coal may be shown monthly for 
a span of years. The analysis of time series, involving a consideration of 
trend, cyclical, periodic (seasonal), and irregular movements, will be 
discussed in Chapters 11 to 16 . 

In a certain sense, time series are somewhat akin to quantitative distri- 
butions in that each succeeding year or month of a series is one year or one 
month further removed from some earlier point of reference. However, 
periods of time — or, rather, the events occurring within these periods - 
differ qualitatively from each other also. The essential arrangement of 
the figures in a time sequence is inherent in the nature of the data under 
consideration. 

Occasionally a time series may be converted into a frequency distribu- 
tion. If a railroad company has kept records of the number of railroad 
ties replaced each year, the data constitute a time series. When the same 
information is used in conjunction with the dates of installation, the life 
of the various ties may be expressed as a frequency distribution, showing 
perhaps: 

Length of life Number of ties 

4 hut under 5 years 2 

5 but under 6 years 5 

6 but under 7 years 17 

etc. etc. 

Geographical. The geographical distribution is essentially a type of 
qualitative distribution, but is generally considered as a distinct classifica- 
tion. When the population is shown for each of the states in the United 
States, we have data which are classified geographically. Although there 
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is a qualitative difference between any two states, the distinction that is 
being made is not so much one of kind as of location. Geographically 
classified data are shown in Tables 3.1 and 3.4 and in Charts 6.19-6.22. 

Sometimes a geographical distribution may be put into the form of a 
frequency distribution. Thus, if we had data of the yield of com per 
acre in each county of Iowa, we would have a geographical series. These 
data may be put into the form of a frequency distribution by stating the 
number ef counties having srields per acre of “ 10 and under 15 bushels,” 
" 15 and under 20 bushels,” and so forth. 

The presentation of classified data in tabular and graphic form is but 
one elementary step in the analysis of statistical data. Many other proc- 
esses are described in the following pages of this book. Statistical inves- 
tigation frequently endeavors to ascertain what is typical in a given 
situation. Hence all t 3 q)es of occurrences must be considered, both the 
usual and the unusual. 

In f. -ming an opinion, most individuals are apt to be unduly influenced 
by unusual occurrences and to disregard the ordinary happenings. In 
any sort of investigation, statistical or otherwise, the unusual cases must 
not exert undue influence. Many people are of the opinion that to break 
a mirror brings bad luck. Having broken a mirror, a person is apt to be 
on the lookout for the expected “bad luck ” and to attribute any untoward 
event to the breaking of the mirror. If nothing happens after the mirror 
has been broken, there is nothing to remember and this result (perhaps the 
usual result) is disregarded. If bad luck occurs, it is so unusual that it is 
remembered, and consequently the belief is reinforced. The scientific 
procedure would include all happenings following the breaking of the 
mirror, and would compare the “ resulting ” bad luck to the amount of bad 
luck occurring when a mirror has not been broken. 

Statistics, then, must include in its analysis all sorts of happenings. If 
we are stud 3 dng the duration of cases of scarlet fever, we may study what 
is typical by determining the average length and possibly also the diver- 
gence below and above this average. When considering a time series 
showing steel-mill activity, we may give attention to the typical seasonal 
pattern of the series, to the growth factor (trend) present, and to the 
cyclical behavior. Sometimes it is found that two sets of statistical data 
tend to be associated. In Chapter 19 it is pointed out that there is an 
association between temperature and the rapidity with which crickets 
chirp. If the temperature increases, the crickets chirp faster; if the tem- 
perature decreases, the crickets chirp more slowly. The relationship can 
be expressed mathematically and we can estimate the rapidity of crickets’ 
chirps from the temperature; or, conversely we can make a good estimate 
of the temperature based upon the rapidity of chirps. 
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Occasionally a statistical investigation may be exhaustive and include 
all possible occurrences. More frequently, however, it is necessary to 
study a smaller group or sample. If we desire to study the expenditures 
of lawyers for life insurance, it would hardly be possible to include all 
law 3 ^ers in the United States. Resort must be had to a sample; and it is 
essential that the sample be as nearly representatht^ as possible of the 
entire group, so that we may be able to make a reasonable inference as to 
the results to be expected for an entire population. The pioblem of 
selecting a sample is discussed in the following chapter. In Chapters 24, 
25, and 26 an attempt is made to determine how much reliance may bo 
placed in the results obtained from samples. 

Sometimes the statistician is faced with the task of foro(\asting. Ho 
may be required to prognosticate the sales of automobile tiros a year 
hence, or to forecast the population some years in advance. Several 
years ago a student appeared in a summr^r session class of one of the 
writers and in a private talk announced that he had (^omc to the course for 
a single purpose: to get a formula which would enable him to forecast the 
price of cotton. It was important to him and to his employers to have 
some advance information on cotton prices, since the coru'crn purchased 
enormous quantities of cotton. Regrettably, the young man had to be 
disillusioned. To our knowledge, there are no magic formulae for fore- 
casting. This does not mean that forecasting is impossible; rather it 
means that forecasting is a complicated process of whiclTa formula is hut 
a small part. And forecasting is uncertain and dangerous. To attempt 
to say what will happen in the future retjuiros a thorough grasp of the 
subject to be forecast, up-to-the-minute knowledge of developments in 
allied fields, and recognition of the limitations of any Tnechaiii(‘al fore- 
ca.sting device. Further comments concerning foreca.sting are to be found 
in Chapter 22. 

Inlerprclalion. The final step in an investigation coiisists of inter- 
preting the data which have been obtained. What are the conclusions 
growing out of the analysis? What do the figures tell us that is new or 
that reinforces or ca.sts doubt upon previous hypotheses? The results 
must be interpreted in the light of the limitations of the origimal material. 
Too exact conclusions must not be drawn from data whi(*h themseivms are 
but approximatiorts. It is essential, however, that tlie investigator dis- 
cover and clarify all the useful and applicable meaning which is pres(jnt in 
his data. 


A FEW IMPROPRIETIES 

The research worker must be constantly on the alert to avoid any mis- 
uses of his material. Illogical and careless reasoning or improper use of 
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data will destroy the value of a study whieh may be technically acceptable 
ill its earlier phases. A few examples of fallacious procedures may clarify 
this point. In later chapters of the book, other fallacies are occasionally 
mentioned in conne/*tion with the methods to which they apply. 

Bias. The presfince of bias on the part of an investigator is, obviously, 
sufficient to discredit the entire undertaking. Bias may be conscious or 
deliberate; in sinth a case it is synonymous with falsification. On the 
other hand, an uncons(‘ious bias may he operative, and tliis, perhaps, is a 
more dangerous form, since the analyst himself may not be aware of it. 
'The following is an illustration of apparently unconscious bins:^ 

A fiicnd h:nl invited jui :icfjUHjntiin<-e to huu‘h, and fouial at the end 
of the meal that lie had left, his purse in the office and had no nionc> . 

The acquaintance, at liis request for a loan, took out a fiviMlollar bill 
and a ten-dollar bill. Mv frietid took one of diem -to this dav he does 
not know which telling liis a(‘quainttince not to let him forget the loan. 

He did Coiget it, liowevor, until several weeks later when they met again, 

.i..l c:>v’h wiot(' on a {)>ece of paper the sum he thought had been bor- 
jowi'd. Th(* ]en(l(‘r wioti; ten, and the borrower five. They v/erc both 
])^^ clKilogiMs. so each s('arched his menioiy carefully, and ca(*h had cir- 
cunistaiitial evelence that seemed to each conclusive, to prove himself 
right, X(Mthc( caied about or needed the money especially, but to them 
it indic;ited a uni\ei.sal principle, ^h.at (wh of us intoiprcts and remom- 

» b(‘i.? facts in tlie fonn ino^t agrciwble to hinij^elf. No wonder both side.s 
must be M‘pi('S('nb‘d in couits of law. and that iiiucli lionesDv given 
evid(‘nco must bo icdectetl! 

As will be seen in tlic following chapter, stati.stical data cannot be 
picked out of thin air as t lu' conjmer appears f>roduce coins at his finger 
tips. The ju'oces.s ts oiu* n'quiriiig care aiul (uition to details. The 
data, when obtained, should be of value and not. be casually disregarded. 
Note what a riwiewer said ot a cfutidn authm’: 

Blank IS Ihoiougli and undaunted. Have statistics on any subjecd. 
been c()lle<’t(‘d Ix'forc? He has colleidcd nioie and better ones. If it is 
by ifs intrinsic n.aluie unchartable, Ix' lias cliart.ed it none the less. . . . 
..(’lironology itself fares badly in bis band.s at times. If liis examples 
reijuire to be a ciMitiiry or two misplaced, Bl.ank ean forget even his sta- 
tistics .and his c.ha,its in the good cause* of logic. 

Omission of imporlanl fa<'<or. Shortly the introduction of 

the all-m(d.a] top for automobihxs, a eiMlfiin manufacturing company felt 
called upon to prove that all-metal tops did not result in hotter car 
interiors. They siiggeste<l a test involving three steps: 

' From '‘Th(’ Mind of a Oliild/’ by Jessk’a C'osgrovc, Good Iloui^ekecpinff, January 
1027, p. ‘206. 
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1* Take a piece of top fabric about 8 inches square. Place a piece of 
lining material of similar size beneath the fabric, and a thermometer 
beneath the lining material, 

2, Take a piece of highly finished steel about 8 inches square. Place 
similar sized pieces of ^inch felt and lining material beneath the 
metal, and a thermometer beneath the lining material. 

3. Place each of the above assemblies on a board at room temperature. 
Carry the entire apparatus out into hot sunshine, leave it exposed 
for about 10 minutes, and then read the temperature of the two 
thermometers. 

The difficulty with the above experiment is that the reader is asked, in 
step 2, to use a piece of highly finished steel. Automobile tops are painted 
" - some of them with black or a dark color of paint — and therefore absorb 
more heat than does highly finished steel. The obvious fallacy in the test 
vitiates the experiment, although the additional insulation may actually 
make the metal-top car cooler than the fabric-top car. 

Carelessness. We cannot go through life without making mistakes, 
but carelessness should be reduced to a minimum. The wife of one of the 
authors wrote to a large department store to ask the size of a cedarized 
storage chest. The reply said, ^^This merchandise is available in the 
W* X 1" X li" size.'^ 

Many of us have received sealed envelopes minus enclosures, or postal 
cards blank on the message side, and have, perchance, been guilty of send- 
ing the grocer's bill back to the grocer minus the check or with the check 
unsigned. 

A study of" salaries was under way and a certain corporation had been 
requested to furnish data concerning its employees. A note to its report 
appeared substantially as follows: “All salaries under $5,000 per annum 
are shown as the maximum for each type of work. The assistant to the 
auditor stated that the maximum is equivalent to a general average for 
each group." Perhaps this is an illustration of a conscious bias on the 
part of the assistant to the auditor. It must be obvious that, if the maxi- 
mum and the average are the same, then there are no values below the 
maximum. 

A chain store advertised chuck roast at 49 cents per poundi In one of 
its stores there were nine chuck roasts, all wrapped in transparent mate- 
rial and labelled as to price per pound (49^), weight, and price for the 
piece. Three of the roasts were marked as follows: 3 lb. 9t oz., $2.92; 
4 lb. 15f oz., $4.05; 4 lb. 12i oz., $3.86. Division of these prices by their 
weights will show that the charge was at the rate of 81 cents per pound, a 
price much higher than that current at the time for chuck roast. Several 
months later similar mispricings were observed in the same store for legs 
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of lamb, so possibly this illustration should be listed under a heading 
other than ‘‘carelessness.” 

Non-osequitur. A weekly news magazine, the circulation of which 
had been growing in a healthy fashion, undertook to demonstrate for a 
particular year that its readers greatly exceeded its circulation. After 
showing figures of its circulation, the magazine stated: “And each of these 
subscribers represents 3.26 cover-to-cover readers, according to former 

Deputy Police Commissioner , who counted and identified [sic] 

216,948 fingerprints on copies his operatives had picked up at random 
from subscribers’ homes in seven different cities or towns.” How could 
the investigator know the fingerprints belonged to cover-to-cover readers? 
Or, did he find each fingerprint on every page and, if so, does that prove 
each page was read? Do 3’ou ever actually read a magazine from cover 
to cover? 

Non-comparable data. In July 1936, newspapers carried reports of 
a meeting of the American College of Osteopathic Obstetricians at which a 
doctor is leported, by a metropolitan paper, to have stated that the mater- 
nal death rate among mothers treated by osteopathic physicians was less 
than half that among cases handled by the medical profession. The 
higher rate in the latter instance was said to be due to excessive use of 
anaesthetics, interruption of labor, and undue reliance on mechanical 
devices. A survey of 14,000 osteopathic delivery cases was said. to show 
a maternal death rate of 2.8 per thousand cases. This figure was com- 
pared with the nation’s average of more than 6 per thousand. It should 
be obvious that the average rate for the entire country is not representa- 
tive of the rate for cases attended by the medi^'al profession, since many 
maternity cases are not attended by physician.*^ 

The makers of a small, inexpensive car had been stressing the fact that 
the introduction of their car had converted many used-car buyers into 
new-car owners. Concerning costs of operation, they pointed out that 
“owners report up to thirty-five miles to the gallon of gasoline, which 
compared with the average mileage obtained with a used car ... is a 
saying of great importance to persons in the low-income group.” The 
comparison of maximum mileage for one type of car with average mileage 
for other types of used cars is certainly unjustified. 

Confusion of association and causation. Sometimes factors which 
are associated are erroneously egarded as being causally related. A 
southern meteorologist discovered that the fall price of corn is inversely 
related to the severity of hay fever cases. This does not imply that the 
low price of corn causes hay fever to be severe, nor does it imply that 
severe cases of hay fever bring about a dfop in the price of com. The 
price of com is generally low when the corn crop has been large. When 
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the weather conditions have been favorable for a bumper corn crop, they 
have also been favorable for a bumper crop of ragweed. Thus the full 
price of corn and the suffering of hay fever patients may each be trac.ed 
(at least partly) to the weather, but are not directly dependent upon each 
other. A further discussion of association and causation is given in 
Chapter 19. 

Another instance of the corifusiori of association with causation 
occurred in a statement by a research organization which, haviiig'studied 
annual data, said, “Wheii farm income goes up, fac'tory payrolls invari- 
ably follow, but they do not lead the procession. One is cause, the other 
effect. If such a procc.ssion does exist, it can hardly be shown by annual 
data. If factory payrolls /oZ/ou; farm income, we should show that fact 
by plotting monthly data as is done for other series in Chart 2l2.9 and 
(’hart 22.10. As to the causal relationship, it is fairly obvious that, while 
an iiKTcase (or decrease) in farm income docs have a corresponding cfb'ct 
upon factory payrolls, the payrolls in turn have a reciprocal effect upon 
farm income. Furthermore, both are depenrlent ui>on any other factors 
whi(*h tend to affect the pattern of general business. 

Insuflicient data. ln.snflif*icni data result in a high degree of uncer- 
tainty respecting ar^y conclusion wliich may b(* mad(‘ from tluan. A very 
small sample may lead us to a rorrcct com^lusion, but we caunot place a 
high degree of a.^^suranct* in our coiudusion. WIh'U a mf'dical worker’ is 
developing a now treatment, he <loes not announce its ef}i(\acy after trying 
it out on a few individuals. He must have enough data so that he ean be 
relatively sore of results. If two or thre(‘ .subjects rt'spond favorably, he 
cannot be safenn claiming that the oceurrences were not din* to chanee. 
The favora})l(* response's of th<*s(» few might have come without, the treat- 
ment, or in spite of it! Of eour.se, theu'e must be a “control" group it) 
show how the sul)ject.s would rcspoml without any treat imait. or with the 
usual treatriK'ut. More'over, both the control group and ihv treateal 
group must be sufficiently large to warrant a eonciusion. V di.scussion 
of tliii rdiaiaiity of values computed from samples is given in (Uuiptf'rs 
24 ‘ 2 (). 

« 

Unrepresentative ilata, Coii<*lusions may be based upon d^ta which 
arc num(iricaliy sufficient, but whiidi are not representative A small 
sample may be representative; on the other hand, a largii sample may not 
be representative. 

An exumplc of a eonciusion ba.sed upon unrepresentative data is the 
foreca.st of the 1930 prc.sidential election as made by the Literary Digest. 
More than 10,(KX),0()0 straw ballots were sent out by the* Digest. Of 
these, 2,37(),r)23 were returned and they indicaU^l that, 370 electoral votes 
would be cast for I.andon and Iff! for Roosevelt, The final election 
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results were 523 electoral votes for Roosevelt and 8 for Landon. The 
difficulty was that the mailing lists used as a basis for the poll were rela- 
tively heavily weighted with persons in the upper economic brackets and 
thus were not representative of the entire voting population. 

Concealed classification. Conclusions drawn from statistical data 
may sometimes be invalid because of the presence of a concealed classi- 
fication which is overlooked. The fallacy of concealed classification is 
illustrated by .some data which appeared in the Monthly Labor Review and 
concerning which its readers w’ere warned. Data w'ere presented showing 
thfi union wage rates in Hebrew and in non-Hebrew bakeries. It 
appeared from the figures that Hebrew bakeries paid an average hourly 
rate about 50 per cent higher than non-Hebrew bakeries. Qualifying this, 
the Review said, ‘^Although Hebrew^ bakeries generally have higher rates, 
one reason for this large difference is the fa^.t that a large proportion of 
the Hebrew bakeries are located in New York City, where the average 
of rates is higher than in other localities.’* 

A concealed classification was found to be present in a study of suicides, 
'fhe data seemed to sliow that suicides were more likely to occur among 
certain religious groups tlian among others. Upon further consideration 
it was apparent that the matter of the urban or rural occurrence of the 
suicides had been overlooked. H -nce the conclusion should have been- - 
fiot that suicides tended to tie up with given religious groups— but that 
suicides were more common in urban territories and that these religious 
groups were also more numerous in the cities. 

Failure to define units. In a pamphlet given to each motorist with 
his renmval of an automobile vA iclc or dric^u ’s license, a state automo- 
bile commis.sioner called attention to the faci that 26 years earlier the 
^‘mileage death rate” had been 23.6 while in the year just ended there had 
been a mileage death rate” of 4.2. There was no explanation of whether 
this was the number of deaths per mile - or per thousand miles - of high- 
way in the state, or the number of deaths per hundred, per thousand, or 
per million miles of vehicle travel during the year. Certainly it was not 
deaths per mile of vehicle travel, although at a quick reading that was 
what it seemed to be. Inquiry revealed that the ratio was the number of 
highway fatalities per hundred million miles of vehicle travel. The 
mileage was obtained by multiplying the number of gallons of gavsoline 
sold in the state during the yeai y 13.12, the average mileage per gallon 
of gasoline. Incidentally, one may well wonder about the accuracy of 
this average and how it was obtained. Gasoline sales were, of course, 
available from state tax records. 

Mislending totals. Those of us who read the sport pages of the 
newspaper are likely to have noticed a statement each autumn to the 
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^ect that a certain number of thousand— or million — ^fans had watched 
the home team play during the baseball season just ended. For example, 
it was stated that 1,538,007 fans attended the home games of the New 
York Yankees during the 1953 baseball season. This figure was arrived 
at by adding the number of persons attending each home game. It does 
not, as is too often carelessly said or intimated, represent 1,538,007 fans, 
but rather the specihed number of admissions, many individuals having 
attended more than one game. 

A somewhat similar meaningless, but impressive-sounding, total was 
present in a statement made by a horticultural concern that had recently 
acquired another similar company, which itself represented a recent 
merger of two other concerns. The statement was to the effect that their 
combined horticultural experience now totalled 295 years. This figure 
YfBS obtained by adding the ages of the three companies. 

Poorly designed experiment. For an experiment to be valid, it 
must be so designed* that the results which are arrived at cannot be 
attributed to factors other than those which are under consideration. 
The illustration which follows will be mentioned again, in another con- 
nection, at the end of Chapter 25. At the time that fluorescent lighting 
was first introduced, some people believed that persons who were exposed 
to the radiation of the lights would become sterile. A railroad had 
already installed fluorescent lights and, hoping to counteract this belief, 
undertook an experiment in which one group of rats was subjected to 
incandescent light, while another group was subjected to fluorescent light. 
After a period of time the fii^ group had the usual number of offspring, 
while the second group had nonet A skeptical executive asked that the 
second group of rats be re-examined with care, and it was discovered that 
all of the rats of that group were of the same sex. It is elementary that 
the two groups should have had the same sex composition. 

RESEARCH METHODS 

It must not be assumed that the statistical method is the only method 
to use in research; neither should this method be considered the best 
attack for every problem. Just as the carpenter has a number of tools, 
each appropriate for a different sort of operation, so the researcher can 
avail himself of various techniques which are the tools of his trade and 
each of which is appropriate to a specific type of situation. If an amateur 
carpenter uses a screwdriver in lieu of a chisel, the results are not likely to 
be either workmanlike or satisfactory. Similarly, it is important that the 
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investigator consider his problem carefully at the outset and make use of 
the technique or techniques which are appropriate to it. Just as the 
carpenter needs to use more than one tool in completing a piece of work, 
so the research worker must often make use of, not one, but several 
methods.* 

When we desire a great deal of information concerning each individual 
or occurrence to be studied, much of our data may be non-quantitative 
by its very nature. In such an event we employ the case stvdy method of 
investigation, the purpose of which is to consider in detail the character- 
istics peculiar to the individual case and to generalize from a number of 
such detailed studies. Some of the information obtained in a study of 
case histories (such as wages, number of offspring, and so forth) may be 
statistical, and when many cases are included, statistical summaries may 
be made of the non-quantitative information obtained. 

If interest centers in changes in behavior or attitudes, the panel tech- 
n’q ; ■ ’n.ay be used. This consists of interviewing the same group of 
people on two or more occasions. The panel procedure may obtain data 
of a quantitative nature when information concerning, for example, con- 
sumption habits and family budgets is obtained; as for case studies, 
statistical analyses may be made of non-quantitative information, such 
as opinions on public questions, if the panels are large enough. 

Sometimes a problem may be attacked by the historical approach. 
Although the historical method is largely descriptive and non-quantitative, 
we may find statistical aspects when we consider growth or decline of 
imports, exports, population, and other series. 

Again, the appropriate procedure may be w make use of the experi- 
mental method, in which we allow only the factor we are studying to 
vary, and attempt to control as many as possible of the other factors. 
For example, if we wished to study the effect of car weight upon tire 
mileage, we should control road conditions, speed, temperature, size of 
tire, quality of rubber and of cord, inflation of tire, and many other 
factors. 

In the social sciences, the experimental method can rarely be applied 
and certain aspects of the statistical method are used in lieu of it. We 
cannot, for example, ascertain the effect of different sorts of diets upon 
length of life, by forcing groups of people to live upon prescribed diets and 
by actually making all other phases of their lives identical. Instead, we 
must find groups of people on different diets, and then we must measure 
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the importance of, and control statistically, as set forth in Chapter 21, as 
many as possible of the other phases of their lives, since we cannot control 
them experimentally. The experimental and statistical methods are not 
antithetical, but under practical conditions the statistical method supple- 
ments the experimental method. If an experiment could be so designed 
that all variables were completely controlled, statistics might not be 
needed. At best we can usually control but a few of the more important 
factors, and thus it is necessary to evaluate statistically the importance 
of a host of other minor disturbing factors (sometimes designated as 
"chance”)? as described in Chapters 24-26. 

Some problems may be approached by the deductive method rather than 
by the inductive method. When a hypothesis has been set up deductively 
and when quantitative data are available, statistics may enable an 
inductive test to be made of the hypothesis, and this test may serve to 
support or to discredit the hypothesis. Conversely, relationships arrived 
at statistically (as, for example, the rather close negative association 
found in some states concerning the size of farms and the value of land 
per acre) may suggest causal connections which may be worked out 
deductively. Again we have tw^o methods which are not antagonistic, 
but complementary. 
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When a research worker underiakes tlie stady of a topic, he may be 
aVjle to choose between collecting the data himself or obtaining the needed 
figures from already available published or unpublished compilations. 
If an individual or organization has prepared reliable data which are 
pertinent to the problem, it is vastly less expensive to make use of the 
existing information. Although to collect one's own data is more costly, 
that procedure may enable the investigator to obtain exactly the infor- 
mation which is lu^eded to answer the specific questions that are under 
con*si derail on. 

Not all readers will be faced with the problem of collecting original 
statistical data; many w'ill find it possible to refer to existing sources for 
information. However, the data from such sources may be evaluated 
and more intelligent use may be made of them if the research worker has 
some knowledge of the procedure and pitfalls involved in collecting, edit- 
ing, and marshalling statistical data. 

An illustration cited by Stamp* is to the point: Harold Cox, when a 
young man in India, quoted some Indian statistics to a judge. The judge 
replied, ‘'Cox, when you are a bit older, you will not quote Indian sta- 
tisti(*s with that assurance. The government arc very keen on amassing 
‘statistics -they collect them, add them, raise them to the nth power, take 
the cube root and prepare wonderful diagrams. But what you must 
never forgi't is that every one of those figures comes in the first instance 
from the chowty dar (village watchman), who just puts down what he 
damn pleases.” It should be adi: d that this story refers to the India of 
a day long past. Today India has many able statisticians and an active 
statistical society. Presumably the chowty dar no longer functions as the 
source of local statistical information. 

* Sir Josiiih Stamp, Economic Factors in Modern Life, P. S. King and Son. 

London, 1929, pp 258 259. 
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The process of collecting statistical data will be examined first. Later 
in the chapter^ attention will be directed toward the use of statistical 
scmrces. 


COLLECTING STATISTICAL DATA 

Method of collection. Statistical data are frequently obtained by a 
process in which the desired information is obtained from the house- 
holder, business man, or other informant, either by having an en\imerator 
visit the informant, ask the necessary questions, and enter the replies on a 
schedulej or by mailing to the informant a list of questions (sometimes 
called a questionnaire) which he may ansvrer at his convenience. The 
data collected at each population census are obtained by the enumeration 
process, the enumerators undertaking to visit every place of abode in the 
United States. Sometimes information is obtained by registration, 
which means that the information is reported to the proper authority 
when, or shortly after, an event occurs. Thus births and deaths must be 
registered. In many states automobile accidents must be reported to 
the commissioner of motor vehicles. 

In general outline the problems of obtaining data by mailing question- 
naires, by enumeration, and by registration are similar. Under a system 
of registration there is, of course, the difficulty that many persons will 
neglect to register. Constant vigilance and frequent checkups are neces- 
sary on the part of the registrar. Registration, however, is'usually with 
a properly designated government official, and there is ordinarily legal 
compulsion that the data be "supplied. Since most statistical information 
is obtained by enumeration or by mailing questionnaires, the balance of 
this section will be devoted to the procedure for collecting data by these 
methods. 

Outline of procedure. The steps in a statistical iiivestigation, 
which involves the collection of data, may be designated as follows: 

1. Planning the study. 

2. Devising the questions and making the schedule. 

3. Selecting the type of sample, if the enumeration is not to be a com- 
plete one. 

4. Using the schedules to obtain the information. 

5. Editing the schedules. 

6. Organizing the data. . 

7. Making finished tables and charts. 

8. Analyzing the findings. 

The steps will usually be taken in the order shown, except that the 
decision concerning the type of sample to be used may be included as part 
of the first step. We shall discuss etch of the eight steps in turn. 
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1. Planning the study. If a topic is to be studied statistically, it 
behooves the investigator to become familiar, at the outset, with what 
has already been done by others. He may find that someone else has 
already examined the same topic and that his questions have already 
been answered. He may wish to design his study so that it can be com- 
pared with those which have preceded his. He will doubtless profit by 
the experience and the mistakes of others. He may find that the diffi- 
culties involved in the investigation of his topic are so great that they are 
insurmountable: the cost may be too great, or it may appear that inform- 
ants do not wish to divulge the type of information which is needed. 

Having studied what others have done, the investigator is ready to 
consider the general aspects of what he would like to know. If an 
employment and unemployment study is projected, there are many 
inquiries concerning each individual which arf. pertinent. The following 
suggests some of the more important ones: 

U06o the individual have any tlependents? How many? 

Is the person male or female? 

What is his or her marital status? 

How old is the person? 

Is he native white, native colored, or foreign born? If foreign bom, 
from what country? 

• Does he own property ? 

What is his usual occupation? In what industry? 

What type of work is he doing at present? (If the study is a detailed 
one, consideration may be given to listing the job experience of the 
individual for a number of years, together with the wages received.) 

Is he employed full time? Part time? Is ho entirely unemployed? 

If the individual is working part time or is totally unemployed, 
what is the reason? 

If he is totally unemployed, liow long has he been so? Also, is he 
able to work and willing to work; or, alternately, is he actively looking 
for work? 

The reader will doubtless think of other questions of important^e, but 
'these suffice to indicate the nature of this preliiuinary step. Usually we 
cannot undertake to obtain answers to all the questions which are impor- 
tant. It may be too expensive to make so comprehensive an inquiry. 
There may be some questions (such as the one «:oncerniug property 
ownership or a query in regard : * wages) which informants will often 
decline to answer. The most important and practicable questions are 
therefore selected to form the basis of the inquiry. It is these which Avill 
be incorporated into the schedule. 

There are several matters of general importance which are often con- 
sidered in connection with laying out the general plan. One of these has 
to do with the extensiveness of the study. Will it include the entire 
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community or merely a sample? If funds and enumerators are available, 
we may make a complete enumeration; often we must be satisfied with a 
sample. We shall discuss the selection of the sample after we have com- 
pleted consideration of the schedule. 

Another problem concerns whether the schedule is to be sent out by 
mail (in which case it must be very simple and self-explanatory) or 
whether enumerators are to be used. If use is to be made of paid enumer- 
ators, it is necessary to locate qualified persons. Howc\’er, it is often 
rrue that funds are not available to hire enumerators. In fact, it is some- 
times the case that, valuable as the results of an investigation might be, 
they are not worth what it would cost to employ enumerators! Studies 
nave been made using, as unpaid enumerators, policemen, college 
students, postmen, truant officers, and even school children. 

, A third matter has to do with the place where the informants will be 
intervie^ved. For an employment-unemployment study we could send 
enumerators to interview people at their w'ork, in the streets, or at home. 
It is obvious that the last of the three is preferable. For the unemploy- 
ment study we should also consider whether or not to enumerate all the 
people in a household, irrespective of age, sex, desire for work, and mental 
or physical condition. To list everyone would give a complete picture, 
but it also involves much work. When making an employment study, 
we may not be interested in housewives who seek no work outside the 
home. We may be interested in elderly men, in an attempt to learn 
what proportion of the population is retired or is considered too old or 
infirm to work. Since young'children are not ordinarily part of the labor 
force, it may be desirable to exclude all persons below (say) 14 or 10 years 
of age. For the purpose of the following illustration, w'c shall consider 
that all persons over 14 years of age w'ere enumerated. 

2. Devising the questions and making the schedule. It has 
already been pointed out that not all the questions which we would like 
to have answered can be included in the schedule. Having selected those 
topics which we wish to include in our inquiry, we must formulate each 
question so that it may be readily and accurately answered, and then w’e 
must draft the schedule form. The schedule form shown on page 19 is 
one that might be used in a community study of emplo 5 mient and unem- 
ployment. This schedule would, of course, be supplemented by a sheet 
or booklet of instructions to the enumerators. The instructions would 
explain what is meant by household** and by “family,** since both terms 
are used, whether age was to be “to nearest birthday** (the so-called 
“insurance method**) or “to last birthday** (the so-called “census 
method**), what categories are to be used for “race,** the meaning of the 
terms “occupation** and “industry,’* and so on. 
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Name 

. Area .. . . Household, 

Address 

Card . . Enumerator 

1. Relation to head of household 

^ 2 . Age 3. Sex . . 4. 

• 

5. Regular employment: 

6. Present emplo>ineiit: 

Occupation 

Occupation 

Industry 

Industry 


7 Circle one number to indicate what this person was primarily doing during the week ending March 
20, 1954- 

01 Working for compensation in money or "kind.'’ 

02 Self-employed 

Has a job or is self-employed, but not at work because 

03 On vacation 

04 Bad weather. 

05 Labor dispute 

06 Layoff of 30 days or less. 

07 Own sickness. 

08 Other .. .. 

09 Not at work, new job to uegin within 30 days. 

* 10 Not at work, looking for work. 

11 Casual worker, no regular job. 

12 Attending school. 

13 In the armed forces. 

14 Keeping house (not as employee). 

15 Unpaid worker on family farm or in family business. 

16 Volunteer worker, not on family farm or in leinily business. 

17 Retired. 

18 Physically or mentally unable to w’ork. 

19 Inmate of institution. 

20 Other 


8. If this person worked at uU last week, for compensation, or on family farm or in family business, or 
as a self-employed person, how many hours did he or she work? hours. 


9 If this person was looking for work, how many weeks has he or she been seeking employment? 
weeks. 


Remarks 


Urbantowo Employ menl-OiMBptoymeiit Study, 1054 

Url>aiitown Employment-Unemployment Schedule* 
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VETERANS ADMINISTRATION 
WASHINOTON as. D. C. 


(Veteran's name* 
address, and 
policy number 
appeared here.) 


» t« yaw !• t« m M4«i»i|fi 


CoHM^MAtljp th« Vctarana Actair 


tudf aikich *111 alaaat cattainljr ylaH faauHt af graat ing altit tha Uaitag ttataa 1^1 Ic Maalth Sc 


«( thta gwalton bg dtttribullng the t 


cloiad ««MBi lannatrc. 


ONly a (a« almitaa af yaar ttM ail| ba ragalfa< 
giata tt and an anaalag ahlah reguirat na gatlaga ii 


tAg tha gaaiibla affacis al tobacao an haallh. Tha aa(daitca claaad /at gaur canvanianaa in returning yvur guaallonnaira, 
grraantir aaailabja in ragardialba aubjacc daaa aat alaartg f hnaa faw aill faal a aanaa of partanal tat i • fact tan in 
aiiabliab abaiKar arnat tba uaa af tabacca ia a aariawa haa* batgtng iha gaaarnnant naka Ibia valuabla raiaatch atwdg. 
ard aacagt far garaana altla eartatn diaaaaaa. It la naaaa> 

aatjr <a gathar iba data Iran a large niMbar af garaana ia Vtib anny triaaka far irawr roagaraiion. I a« 

ardar la abtain a dagandabla anaaar. gt^, Mira 


TT/Z‘a 

' V.'- - • — / f 

Adniaiattatar af Veterana* affaire 


PLCASC ANSaCR EACH or THE FOLLOtklNS OUSSTIONS «MICH AFfLIES TO TOO. 
rOU* iCST estimate. la aaw 





CIQARCrTES 


nw rMMat tmk. taia a 
eAatms ao tbw gmi m 
laeui 


uaa nanv ituii 
MVe vow BMWfB 
at TMia aartt 


macp ciOAWma onct la ««iitr 
auT WOT larav aav 
aiMiLaaLV gnac ciMacTTte aur 
LCi* ntM 10 a aav 


gMHto ciOMcTres ence m 
^ eanitl ft wot iviat OAv 
, acouLaatv MMaCb ifia TNa« 
10 cniaaiTna » aav 


I ecawi.«in.v Moata wtac Tuan ig a 
ttaa iwMi aa ciQAatirta » i>a|r 
. acowLaaiv gMio a* aa 


One Side of a Questionnaire Used by the Veterans Administration and the 
United States Public Health Service for a Study of the Use of Tobacco. The 
reverse side asked concerning cigars, pipe smoking, and use of chewing tobacco and 
snuff. 
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A portion of another schedule, which was sent by mail to insured 
veterans of World War I, is shown on page 20. The purpose of this 
schedule was to obtain data concerning the use of tobacco, which is to be 
studied in relation to cause of death as these veterans die. The schedule 
shown here contains only the section dealing with the use of cigarettes. 
Similar sections for cigars, pipe smoking, and use of chewing tobacco and 
snuff were on the reverse of the schedule. 

A third,* and very simple, schedule is shown just below. This was a 
postcard to be returned to the Country Gentleman magazine. This form 
is of interest, not only because of its simplicity, but also because the Curtis 
Publishing Company sent a “shiny new dime'' as a “token of apprecia- 
tion" to those cooperating. The company states that a postcard 
questionnaire, such as the one shown, will bring in a return of about 


1. How is your mail delivered? R.F.D. or Star route 

At Post Office Door-to-door delivery 

2. What is the occupation of the 

head of your household? 


31 / 

c 


3. What is his (or her) kind of business? 

* 

4. Do you live on a farm or ranch? Yes No 

5. If you do not live on a farm or ranch, does anyone in your household 

a. Own or rent farm land? Yes No 

b. Operate or work on a farm? Yes No . ... 


6. If you are not a farmer, what is your interest 'n Country Gentleman? 


Po8t-card Questionnaire Used by the Curtis Publishing Company. 


20 pqr cent when no coin is sent. When a dime was sent, a return of 
65 per cent was obtained. It was also found that by using a quarter 
instead of a dime, the return could be brought up to about 70 per cent. 

The construction of statistical schedules is something which is learned 
most satisfactorily by actually making and using them. Nevertheless, 
there are some cautions which are helpful: 

(a) Clarity is essential. The entire schedule, as well as each question, 
should be as simple and as clear as possible. This is particularly true of 
schedules sent to, or left with, persons to be filled out at their convenience. 
An ambiguous question or a question that invites an ambiguous answer 
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produces useless data and involves waste of time and money. One 
organisation, in making a study, queried some hundreds of parents: 
your child^s outlook on life broader or narrower than yours was at the 
same age?^' The investigator presumably expected the replies to read 
“Broader” or “Narrower.” Replies actually received, however, were 
frequently “Yes,” “No,” “I doubt it,” and “I hope so” -none of which 
had any meaning. Furthermore, the question is so worded as not to 
allow for the fact that there may be two or more children in the family. 

The inquiry concerning marital condition when put “Married or 
Single?” is open to two objections: (1) Either a “ Yes” or a “No” answer 
is meaningless; (2) not all persons are included in these two categories. 
One good way of asking this question is to say: 

Check whether: 

Single 

Married- 

Widowed 

/ Divorced _ 

Separatecl 

To claritiy the meaning of “single,” the term “never married” is some- 
times lu^ecl, 

investigator should not be satisfied merely with wording his fp'es- 
Jidom so that they can be understood; he should draft them so carefully 
that they cannot be misunderstood. 

(b) Not all questions cart be accurately answered. N o matter how clearly 
a question is stated, there are some sorts of queries which are apt to eli(*it 
unsatisfactory returns. The schedule used in 1950 for the Census of 
Population and Housing of the United States asked for the age at last 
birthday for each person enumerated. Reference to the published 
results in 1950 Census of Population^ Vol. II, Part 1 , Table 94, shows some 
peculiar irregularities in the distribution of the population by single years 
of age. Beginning with age 25 and continuing through age 70, there are 
definite concentrations of persons on every age ending in 0 or 5, except-* 
for age 55. For example, there are more people who were reported to be 
25 than either 24 or 26 years of age. There are also secondary concen- 
trations upon some afr,es which are multiples of 2, most noticeable when 
these even years of age are not adjacent to a multiple of five. Thus, 
there are concentrations at 28, 32, 38, 42, and so on, through 62. Fur- 
thermore, there seem to be too many males reported as 21 and too many 
non-white females reported to be 18 years old. The Enumerators Refer- 

* Non-whit^ and foreign-born whiti- showed concentrations on 56. Native white 
showed a concentration on 54. 
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rnce Manual (p. 34) notes that some ages will be reported in round 
numbers and warns the enumerators as follows: Estimate of Age. If a 
respondent gives an offhand estimate such as 'around 60/ try to find out 
whether the person is nearer 58 or 59 or possibly 61 or 62. Try to get it as 
accurate as possible. If age is not known, enter the estimate as the last 
resort, and footnote it as an estimate. An entry of '21 plus’ is not 
acceptable. 

4'he rounding of ages is not peculiar to the United States Census; it may 
be expected to occur in an}' inquiry where age is not obtained from birth 
certificates or some other accurate record of date of birth. Some of the 
factors believed to lead to leporting ages in round numbers are: (1) The 
information concerning an individual is not necessarily furnished to the 
enumerator by the person himself; it is often given by a relative, friend, 
rooming-house keeper, or other person, and soiac of these informants can- 
not have exact information, (2) When ages are intentionally misstated, 
as tl'^^v 0 (*(*a.sional]y are, there is reason for believdng that they are often 
rounded. (.3) Some persons are careless, or occasionally a person of low 
intelligence may ahva 3 ^s think in terms of round numbers. Rounding is 
most nol icoable for those classes of the population in which the proportion 
of ilht, orates is greatest. (4) A few persons do not know their exact ages. 

(5) 'There may he carelessness c.i the part of enumerators. Some 
iinprovemojit in the accuracy of reporting ages may be had by asking date 
of birth instead of, or in addition to, age. It should be recognized, how- 
ever, that the posing of a more exact question does not produce better 
<lata when exact knowledge is lacking, as in the case of a landlady report- 
ing for her roomers. Furthermore the mattCi >r the expense involved in 
asking this additional question might more J an offset the exp(3cted 
increase in accuracy. When age is of primary importance, as in the case 
»)f application for insurance, date of birth is usually asked and may be 
verified by documentary evidence. 

Another interesting example of thinking in terms of round numbers 
occurred in the case of a contest sponsored by a motion picture theater. 

" An irregular-shaped glass jar was filled wdth cianberries, and six prizes ^ 
were offered to the patrons who guessed most nearly the correct number 
of cranberries in the jar. An analysis of the 1,996 guesses showed that 
thc?re AV(3re 1,465 wdiich ended in 0 or 5. 

(c) Certain types of questions si dd he avoided. When the prosecuting 
attonie.y asked the alleged wife beater, "Have you stopped beating your 
wife?’’ he attempted to put the defendant, whether he replied "Yes” or 
"No,” in the position of admitting that he had beaten his wife. In a 
scientific investigation we should scrupulously avoid such leading ques- 
tions. When asking the reason for unemployment in an unemployment 
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survey, made during a depression, an enumerator would be suggesting 
the answer if he said, “I suppose you are unemployed because of the 
depression"^ Rather, he should inquire, '^What is the reason you are 
unemployed?'' 

Questions which are unduly inquisitive or which are liable to offend 
should likewise be avoided. In a study of social workers, each married 
woman was asked whether or not she lived with her husband. The 
inquiry was injudicious, aroused resentment, and would hardly have been 
productive of useful data if it had been answered by all the persons 
queried. Questions concerning personal matters (such as income) should 
be handled with tact - -perhaps asked at the close of the interview after 
the cooperation of the informant has been obtained. Sometimes it is 
better not to ask such a question but to infer the general income level 
from knowing if there is a telephone in the home; if the home is owned, 
and its apparent value; the individual's occupation; make of (‘ar(s) 
driven, if any; servants employed, if any; and so forth. The 1950 Census 
of Population avsked the amount of income for a twenty per cent sample 
of the population and, although this question - like all Census queries 
was authorized by law, a special confidential form requiring no postage 
was provided for those who preferred to send this information directly 
to the Bureau of the Census. In one survey informants were asked: How 
much cash do you customarily carry on your person? How mnch cash 
do you ordinarily keep around the house? Many refusals to answer may 
be expected for such questions. 

(d) Answers should be objective and capable of tabulation. When factual 
studies are being made, questions should be so designed that objective 
answers will be forthcoming. Instead of asking the condition of a build- 
ing and allowing the enumerator to state the condition in his own words, 
a study made by the United States Department of Commerce asked if a 
vstructure was in good condition, needed minor repairs, needed structural 
repairs, or was unfit for use. Although the answers to these question!- 
are not completely objective, at least they are capable of being readily 

^ tabulated. 

(e) Instructions and definitions should be concise. 1'ho enumerator and 
informant should never be in doubt as to what information is desired and 
what terms or units are to be used. When iiK|uiring as to the employ- 
ment status of an individual, the inquiry must refer to some S])ecific time. 
Thus, the 1950 ("ensus of Population asked information as of the week 
preceding the visit of the enumerator. 

If information is desired as to the exact situation of a part-time worker, 
it must be made clear whether the desired answer should be: (1) hours 
per day; (2) hours (or days) per week; or (3) fraction of usual full time. 
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The units used in a study should be clearly understood by both the 
enumerator and the informant. If we are collecting data from farmers 
and orchardists on apple production, we should specify whether we want 
data in terms of bushels or boxes of fruit. If we desire information as to 
the number of rooms in houses, it should be noted whether or not bath- 
rooms, kitchenettes, powder rooms, dressing rooms, and the like are to be 
counted as rooms. 

(f) Arrangement of questions should be carefully planned. Not onl> 
must the questions be well arranged on the si^hedule form to allow proper 
space for answers, but the order of the questions should be such as 
to facilitate the answering of each question in turn. If a logical flow 
of thought is involved, it should be followed in the arrangement of ques- 
tions. Questions should not skip back and forth from one topic to 
another. 

After a schedule has been drafted, the desirable procedure is to try it 
out witn a group, discover its short(‘omings, and then revise it in the light 
of the tryout. If there is not time for a tryout, ask some competent 
investigators to go o^^er it and make suggestions for its improvement. 
When the final form of the schedule has been decided upon, careful 
instructions for filling it out should Ve prepared. If the schedules are to 
be mailed to the persons furnishing information, these directions should 
be as clear and consise as possible. If enumerators are used, the instruc- 
tions to the enumerators should be complete in order to cover as many as 
possible of the situations which may occur in their work. 

3. Selecting the type of sample. The L ruted States Census of 
Population is a complete enumeration of the inb ^bitants of the United 
States. That is to say, it is as complete am it is possible to make it. A 
very few people, such as tramps, fugitives frum justice, and dwellers in 
extremely remote places, may not be in^duded, but the intent is to include 
everyone, and no one is knowingly omitted. Similarly, the Census of 
Agriculture undertakes to include all farms in the United States as well 
as certain specialized operations^ ineluding greenhouses, nurseries, 
poulfl-y yards, and apiaries. 

Sometimes a partial enumeration is used instead of a complete enumer- 
ation. Occasionally, only the larger units may he included. For 
example, the biennial Census of " Manufactures for id21 -1939 included 
only those establishments with annual produv'^ts valued at S5,()00 or more. 
The enumerations were incomplete in regard to number of establishments 
included, but included a high proportion of the total number of wage 

^ See II. S. Bureau of the Census, Vnittd Slates Cefisus of Agriculiuret 1960, Vol. 
Tl, p, xxix. 
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earners in manufacturing and of the total value of manufactured products. 
Following 1939, no Census of Manufactures was taken until 1947, when 
all establishments employing one or more persons were included. In 1949 
an Annual Survey of Manufactures was instituted; the annual survey 
uses a sample, employing a combination of the procedures described in 
the following paragraphs. 

It may be too expensive or too time-consuming to attempt either a 
complete or a nearly complete coverage in a statistical study. ’ Further- 
more, to arrive at valid conclusions, it may not bo yiecesmnj to enumerate 
all or nearly all of a population. We may study a sample drawn from 
the larger population and, if that sample is ade(|uately representative 
of the population, w^c should be able to arrive at valid conclusions. There 
are various ways in whieh a sample may ho selected from a population. 
No matter w'hich of these is employed, it must he remembered that the 
cardinal purpose is to obtain a r(3presentative sample, that is, one wliich 
contains all elements in the same proportion as in the population from 
which it is drawn. In short, it is not merely a matter of grabbing any 2, 
5, 10, or 20 per cent sample of a population, but of sele(‘ting that sample 
in such a w'ay that it will be as representative as possible. 

(a) Random sample. If a sam[)le is drawn in such a way that ea(’h 
time an item is selected, each ibun in the population (or universe) has 
an equal chance of being drawn, the .sample is said to be a random hue. 
Under the^e conditions, each combination of a specified number of items 
will have the same probability of being selected. This is sometimes 
referred to as unrestricted or simple random sampling to differentiate it 
from sampling proce^dures which combine random sampling with other 
requirements, for example, the initial division of a non-homogeneous 
population into appropriate homogeneou.s sub-groups. 

When populations are homogeneous, in regard to the charat'teristic in 
which we are interested, random samples may be expected to products 
satisfactory results. If, for example, a large recepta<*le contains a pc^pii- 
lation of thousands of marbles, g of which are white, i l)lack, and j red, 
and if those marbles are identical in size, .shape, density, and all other 
characteristics except color, w^e have a homogeneous population. If the 
marbles can be thoroughly mixed, between each draw of a marble, by 
rotating the receptacle, or otherwise, randomness is not too diiricuU to 
achieve. Under the conditions indicated, it is more likely that a sample 
of marbles will show the three colors in the same proportion that they 
exist in the population than that these colors will be present in some other 
proportion. This does not mean that every sample wall show the propor- 
tion in the population; but if many samples are drawn they will tend to 
do so. Furthermore, wude disagreements will rarely occur. 
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In the illustration just given, randomness was not diffi( 3 ult to attain. 
Suppose that a population consists of equal proportions of four sizes of 
bolts and that all were made from the same material. In such a situation, 
mixing the bolts in a container will not help us to obtain a random sample 
of the various sizes, since smaller objects tend to gravitate to the bottom. 
Satisfactory mixing might possibly be obtained on a horizontal surface, 
but here one would have to be careful not to select the larger bolts because 
they are *more prominent. A somewhat similar problem is met in 
sampling shipments of grain and of coal. For grain, the lack of homo- 
geneity is recognized and samples are sometimes taken by plunging a tube 
vertically into the grain in several locations. This procedure is similar to 
stratified sampling described in section (d). 

Sometimes items cannot be physically mixed, yet a random sample is 
desired. Mixing may be impossible because the items are bulky, 
immovable, or fragile, or because they may be households or individual 
per^C'i . Again, mixing may be possible but may not assure randomiza- 
tion, since the individual selecting items fiom the mixed population may 
not pick the items at random. Randomization is sometimes achieved by 
assigning numbers to the items in the population and drawing the sample 
or samples by reference to'a table of random numbers.^ This may be 
referred to as mechanical randomization, the term being also applied 
to Ihe use of coins or dice. 

When samples are taken from each batch of screws, 'nails, bolts, brick, 
wire, or other products of a factory, physical mixing may not be necessary^ 
vsinco the items may be selected from time to t^me from the production' 
stream. Such a method of selection is not exact iy random and may, in 
fact, contain a bias if the machine, die, drill, jig, or other device used in 
producing the items tends to wear or get out of adjustment during the 
production of a batch. Selecting items from a production stream is 
somewhat akin to the method next described. 

(b) Systematic sample. When a sample is obtained by drawing every, 
,say, tenth item on a list or in a file, the sample is a systematic one. The 
first*item should be selected at random. Such a sample is sometimes 
drawn from an alphabetical list of names or from cards filed in alpha- 
betical, numerical, or other order. Certain population information 
called for on the schedule used for the 1950 Census of Population and 
Housing was obtained for but 20 per cent of the persons listed. To 
obtain this sample, every fifth line on a schedule was labeled ‘‘Sample 

♦ For example, the table given in R. A. F'isher and F. Yates, Statistical Tables for 
Biological^ Agricultural and Medical Research, Hafner Publishing Company, Inc., 
New York, 1949, rP- 104-109. 
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line . . . ask ques. below." Five forms of the schedule were printed, 
each with a different arrangement of sample lines. 

It is important that the basic list, from which a systematic sample is 
chosen, is actually the population which one desires to study. The 
failure of the Literary Digest to forecast correctly the 1936 presidential 
election was due to the fact that its apparently systematic sample of more 
than 2,300,000 ballots was not selected from an appropriate basic list. 
The voters were selected from lists of automobile owners and telephone 
subscribers, whit^h, even more so in 1936 than would be true today, 
failed to include enough of those persons in the lower income groups. 
A similarly incomplete list was used as the basis from which to draw a 
sample for an unemployment study in a New England city during the 
depression of the 1930’8. The sample was selected from the subscribers 
for electricity, gas, and water. The list did not include the poorest 
families. 

No general statement can be made to the effect that more reliable or 
less reliable results may be had from a systematic sample than from a 
random sample of the same size. The conditions under which systematic 
selection is to be preferred to random sampling, or vice versa, are too 
involved to be discussed here,^ 'but one caution should be mentioned. 
The sampling intervals (every 5th item, every 10th item, on a list) must 
not coincide with any constantly recurring characteristics in the listing 
of the items. 

(c) Cluster sample. Before proceeding to describe a cluster sample, it 
will be useful to introduce the term sampling unit. The sampling unit 
is the basic entity in any sample and may be a marble, a bolt, an indi- 
vidual, a manufacturing concern, a farm, a household, a geographic area, 
and so forth. In the case of the marbles, the units were simple and 
differed from each other only in regard to color. Other units may be 
complex and may differ from each other in many respects. For example, 
manufacturing concerns differ in regard to nature of product, capital 
invested, number of employees, and in many other ways. When our 
units are people, we find that they differ in respect to sex, age, race, 
occupation, employment status, economic status, religion, and so forth. 
About all that they may have in common is that they are human beings 
and live in the same community. Such differences are important and 
need to be kept in mind when a sample is selected. The more unlike 
the sampling units, the more difficult is the problem of selecting a repre- 
sentative sample. 

•See M. H. Hansen, W. N. Hurwit.^, and W. G. Madow, Sample Survey Methods 
and Theory, Vol. I, John Wiloy and Sons, New York, 1963, pp. 603-512. 
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The cluster sample is sometimes referred to as an area sample because 
it is frequently applied on a geographical basis. Essentially it consists 
of a random selection of groups of units. For example, on a geographical 
basis, we might select blocks of a city or counties of continental United 
States. As a non-geographical illustration, the bolts of four sizes, 
previously mentioned, might be spread out on a horizontal surface marked 
into squares of equal size and a random sample of the squares taken. 
The bloclcs, counties, or squares constitute the clusters,® and within each 
group all of the units present may be included. Multi-stage sampling 
involves samples of the units from the groups, or samples of sub-groups 
from the groups (for example, townships from the counties in the cluster), 
or both. Multi-stage sampling may also include other types of samples 
in one or more of the steps. 

(d) Stratified sample. When a population is known to be hetero^ 
geneous, and when that heterogeneity has a bearing on the characteristic 
bei^'g studied, the population may be divided into strata and random 
samples of units drawn from each stratum. The purchaser of a box of 
berries recognizes the existence of heterogeneity, and thus of strata, when 
she turns out the contents to examine the bottom as v/cll as the top layers. 
Frequently, the number of units selected from each stratum is propor- 
tional to the number of units in that stratum in the population. An 
interesting application of the stratified sample was made in the study of 
the effects of strategic bombing on Japanese morale’ made by theUnited 
States Strategic Bombing Survey. One important provision in the selec- 
tion of this sample was that interviewers could make no substitutions for 
persons designated on the sampling lists. Substitutions for persons not 
at home, or otherwise not readily available, is a dangerous source of error 
in any type of sample. 

Note that stratified sampling cannot be used unless some information 
concerning the population and its strata is available. An extremely 
important point, which is often overlooked, is that the strata must be 
^ones which are related to the topic being studied. If we are making a 
health study of male students in a college, we might recognize such strata 
as those who do or do not live at home; those who are totally, partially, 
or not at all self-supporting; those who do or do not take regular exercise; 
those who do or do not smoke; and so forth. Howe' er, there are other 
strata which clearly have no bearing on the problem. To take an extreme 

^ The clusters are sometimes called “primary sampling units’' and the items in the 
clusters termed “elementary sampling units.” 

T See Morale Division, the United States Strategic Bombing Survey, The EffecU 
of Strategic Bombing on Japanese Morale^ [Washington], 1947; Appendix T. Out o) 
print. 
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illustration, we might recognize such strata as those who habitually wear 
caps or hats, those who prefer single- or double-breasted coats, or any 
other categories which are not related to health. Another important 
consideration is that stratified sampling is most advantageous when the 
strata differ from each other as much as the population will allow, but 
there should be homogeneity within each stratum. 

Many public opinion and market research organizations make use of 
the principle of stratified sampling. Sometimes enumerators may be told 
to Avork within a given city block (a geographical stratum) and talk with 
a given number of people selected at random. The selection, too often, 
is not a random one, consisting as it does of those who are at home, those 
willing to be interviewed, and those who, by their appearance, look as if 
they would be willing to talk. 

.For a non-homogeneous population, a properly stratified sample may 
be expected to yield more reliable* results than a random sample of the 
same size. From this it follows that the same reliability may be had from 
a smaller stratified sample. There is some danger that investigators, 
having an excessive feeling of security in the stratified sample, may use 
sample.s that are too small to give statistically reliable results. This can 
be guarded against by an intelligent use of the method and of the reli- 
ability formulas.® Although both proper stratification and size of sample 
are important, a large sample cannot compensate for poor stratification. 
Of course, a stratified sample taken from a homogeneous population is no 
more reliable than a random sample of the same size. 

(e) Sequential sampling. Seejuential sampling has been used most 
widely in connection with cjuality-control schemes having to do with raw 
material or a manufactured product, but it is gradually coming to have 
other applications^'’ It involve.s testing a relatively small number of 
items which may lead to a decision to accept or roj(i(‘t the lot from whicli 


* In this text we shall (consider (in Chapters 24, 25, and 26) the ernir forinuliis for 
random siimples only. An understanding: of the behavior of random samples is 
nrecssary groundwork for evaluating 8ample.s obtained by more complex proof, dures. 
Error formulas for other ty]>e8 of samples may be found in 11. M, Walker and J. Lev, 
Statistical fnfetenre, Henry Holt and Company, New York, 1953, pp. 173-177; in 
M. H. Hansen, W, N. ILirwitz, and W. (L Madow, Sample Survey Methods and Theory, 
Vol. I, John Wiley and Son.s, Ine., New York, 1953; and in W. G. Cochran, Sampling 
Techniques, John Wiley an<l Sons, Inc., New York, 1953. 

• See footnote 8 

Applications in comm orcial r€\search are described and the procees of sequential 
sampling explained in Rob^-rt Kerber, Siatistiral Techniques in Market Research, 
McGraw-Hill Book Company, Inc., New York, 1949, Chapter VII. A more com- 
plete explanation of sequential analy.sis is given by the originator, Abraham Wald 
in his book Sequential Analysis, John Wiley and Sons, Ipe., New York, 1947. 
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the sample came. If the first sample leads to no clear decision, it is 
enlarged (possibly one item at a time) until a decision can be made. 

(f) Other types of samples. The five types of samples previously 
described are sometimes referred to as ‘‘probability samples,” since it 
is possible to ascertain the probability that an individual item is included 
in the sample.*^ Other sampling schemes, differing from those already 
described, also exist. They are not considered desirable procedures since 
they involve subjective factors, or their reliability cannot be ascertained 
satisfactorily, or both. Among these are: (1) the purposive sample, in 
which one sets out to make a sample agree with the population in regard 
to certain characteristics- for example, average income and size of family; 
(2) the quota sample,^- in which interviewers, working in a certain area, 
arc instructed to talk with individuals having particular characteristics 
(If interviewers are told to talk with 10 native-white males, 4 Negro 
males, and 3 foreign-born males, it is more than likely that the foreign 
boi"^ r.l.w ue interviewed will be those who are able to speak English 
well enough to be conversed with satisfactorily. This would introduce 
a bias into most studios, since tlie population actually studied would not 
be the population which was inbmded to be studied.^®); (3) the random 
point sample, w4iich consists of locating many points at random on a map 
and enumerating a predetermined number of sampling units nearest to 
eacl^ point (This procedure is occasionally used for sampling farms, but 
through its use large farms are more likely to be included than are small 
farms.). 

When deciding which sampling plan to use, th»" investigator must con- 
sider the efficiency of the scheme. It has already been noted that a 
stratified sample yields more reliable results (^hat is, its sampling error 
is smaller) than does a random sample of the same size. Cluster sampling 
may be expected to yield less reliable results than random sampling for 
samples of the same size. The efficiency of a sample scheme refers to 
fhe reliability in relation to unit cost. Thus, a geographic cluster sample 
wdth groups of units in, say, 20 locations in a large state may have a low'er 

“ See the referonees given in footnote 8. 

A good discussion of quota sampling may be found in F Mosteller and others, 
The Pre-Election Polls of Social ^ >ncc Research Council, New York, 1949, 

pp. 83-91 and 94-96. The danger of using a (luota sample is well illustrated on 
page 95. 

The distinction between sampled population and target population (and other 
principles of sampling) is treated in “IVinciples of Sampling,” by W. G, Cochran, 
F. Mosteller, and M. W. Tukey, Journal of the American Statistical Association, 
Marcli, 1954, pp. 13-35. 
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cost per sampling unit than a random sample of the same size with the 
units scattered here and there about the state. The difference in unit 
cost may be so great that the cluster sample may be made enough larger 
than the random sample so that the cluster sample will yield more reliable 
results than could be had from a random sample for the same expenditure. 

A sample may be selected by use of a combination of the metho<Is 
previously discussed. Here is the procedure followed by the American 
Institute of Public Opinion'* in 1953: 

The regular sample for the national surveys of the American Institute 
of Public Opinion is a sample of the adult population. Provision is 
made for selecting from the regular sample a sample of an approximation 
of the voting population when such is desired. The de.sign provides 
stratification by seven regions (groups of states), and within each region 
stratification by geographical distribution, three rural-urban strata, the 
census economic areas, and the size of the locality finally selected. A 
systematic sample of localities was drawn within each stratum from a 
random start with probability of selection propoi tional to size. Within 
large urban communities sampling units’^ (small clusters of blocks) 
were drawn at random with probability proportional to size. In smaller 
communities and rural regions sampling areas were drawn with equal 
probability. 

Interviewers are assigned selected areas, and required to work within 
the boundaries of such areas. Ea<*h national survey u.ses about 150 
sampling points, with equal numl^rs of interviews a.ssigned to each 
point. A staff of ov^er 1,000 interviewers is maintained. 

Sometimes a sample is taken in a more or less haphazard fashion. Or, 
the investigator may include the data which are convenient or readily 
available, after which he will trustingly announce that the sample so 
taken is doubtless representative of the population which he is studying. 
For example, one researcher, who had ascertained that just under 
2,500,000 children, eligible to be enrolled in high school, were not enrolled, 
desired to estimate how many of these 2,500,000 left school because of 
economic pressure. He managed to locate 16 acceptable studies concern- 
ing the reasons why students left school. These studies each included 53 
to 274 children, a total of 2,525. The studies were made in schools in IS 
different states. Negroes were studied in one instance. There were no 
figures from New York, Massachusetts, Illinois, Michigan, Wisconsin, 
Texas, and certain other populous states. Yet, because the geographical 
distribution was diverse and because large-city, small-city, and rural 
children were included, the investigator concluded: **The sample seems 
sufficiently representative of the various elenients of the population to 

By correspondence from Dr. George H. Gallup, Director of the American Insti- 
tute of Public Opinion. 

** These are appsrently “primary sampling units.” See footnote 6. 
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serve as the basis for estimation of the whole group.’^ This may or may 
not have been true. The sample was neither random, stratified, system- 
atic, nor cluster; it merely included what was available. 

As will be shhwti in Chapters 24, 25, and 26, for random samples, the 
larger the sample, the more confidence we can place in conclusions drawn 
from the sample. It will also be shown that the greater the diversity 
there is in the population, the less reliability we can repose in samples of 
the same ske. Mere size, of course, does not assure representativeness in 
a sample. A .small random or stratified sample is apt to be much superior 
to a larger but badly selected sample. Sometimes a test of stability is 
made to determine when a sample is large enough. For example, a 
sample of 1,000 may be selected from a group of voters, and 57.3 per cent 
of the sample may indicate that they intend to vote for a certain candi- 
date. Another 1,000 may be ehosen, and the t ,vo groups combined may 
show 56.9 per cent. Adding another J ,000 may change the percentage to 
56.8, c’^d still another 1,000 (4,000 in all) may leave the proportion 
unchanged, at 56.8. From this test, 3,000 or 4,000 would seem to be an 
adequate sample from the stamlpoiiit of size. However, the test of 
stability tests only stability and not representativeness, The fact that a 
percentage p.Tsist.s essejitially unchanged means merely that we are coii- 
tirjuing to gel about the same result, as before, (conceivably, the first 
sample of 1,000 could have been decidedly unrepresentative (say, from 
only the poorer sections of the voting population), and each succeeding 
sample similarly unrepresentative. 

Mention has already been made of the possibility^ of bias being present 
in a sample. When a sample is beiiig selected, it s important that bias 
be avoided. Bias does not mean the personal bi -s of the investigator 
which leads him deliberately to select his sample in order to show the 
results he desires. That is intellectual dishonesty. Neither does it 
mean that the persons answering the questions on the .schedule are 
biased. The avoidance of bias involves, first, that there shall be no 
selective factor present in the drawing of the sample, and, second, that 
there shall be no selective factor present when schedules are returned 
from ihose persons included in the .sample. In tlie case of the Literary 
Digest 1936 straw vote, a selective factor wa© present because the basic 
lists from which the sample was selected did not include the lower eco- 
nomic levels of the population. So.>..etimes the basic list may be com- 
plete, but the method of selecting the sample may introduce bi^. Thus, 
a selection from an alphabetical list of names may be unsatisfactory 
because of nationality differences in the alphabetical distribution of 
family names. Such a bias may arise if sections of the list are chosen; it 
is not likely if (sav) every tenth name is taken. 
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The second type of selective factor is frequently encountered if the 
mailed-questionnaire method of collection is used. When schedules are 
sent out by mail, an investigator never expects that all of them will be 
returned. Since only part of the inquiries are an.s\vered,lhow can ho be 
sure that those who did answer are representative of all those to whom 
schedules were sent? Often he cannot be sure; sometimes it is obvious 
that they are not representative. An alumni association sent out 3G3 
inquiries to graduates, asking each to report (anonymously) his income 
for the preceding year. Replies were received from 133. It is quite 
likely that a selective factor was present in these returns. Alumni who 
were out of work or who had very low incomes probably did not reply. 
This assumption i.s borne out by the data, which showed an almost com- 
plete absence of incomes below 81,500, although th(' study was made in a 
depression year. Conclu-sions ba.sed upon biased samples are, obviously, 
not only useless but misleading. 

4. Using the schedules to obtain the information. When agentvS 
or enumerator.s take the schedules to the persons who are to furnish the 
information, the enumerators may explain the purpose of the investiga- 
tion and solicit cooperation. Each (|uestion can be clearly explained as 
it i.s asked. Obviously, enumerators must he carefully instructed before 
they begin their work. Occasionally they are recpiired to study the 
schedule and printed instructions, and then to take an examination, 
f^numerators should bo persons of umiuestioned integrity and should also 
be patient, polite, and tac.tfuK .Many a person resents being bothered 
to supply statisti<‘al (or other) information; some persons are reluctant; 
some refuse. The enumerator should plan his interviews to c-onsume as 
little time as po.ssil>le, and should bend every effort to get. the desired 
information if it is feasible to do so. In some instancc.s the work of the 
enumerator may be facilitated if a letter of explanation precedes the visit. 
Sometimes enumerators <*oruluct interviews and fill in tlic schedules after- 
ward, This is done on the theory that people fetd mor(‘ free to talk if the 
remarks are not being written down at the time. It is believed, however, 
that this is an undesirable procedure, especially wlien there arc a numbel 
of facts to be remembered and later reeonhid. Enumerators should 
carry credentials in order that the persons visited may be satisfied as to 
the official connection of the vi.sitor. Even though an enumerator 
makes his reque.st for information as tactfully as possible, ho may sorne- 
titnes meet with a refusal. Frequently another visitor with a different 
approach may have better luck. It is somfitimes a g()od plan to have 
one especially qualified worker who will follow up the more difficult cases. 

Occasionally an enumerator may ern^ounter a persini who is too willing 
to cooperate nnd who wants to talk at groat lerigth about the study. In 
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such a situation good terminal facilities are an asset. Carl Crow states^® 
that Chinese, when asked certain types of questions, are apt to give 
answers which they think will please the questioner. If an English 
investigating commission asks young Chinese where they want to go to 
school, they are likely to reply, ‘‘England.” The same author tells^’ 
of an investigation made in Amoy, where, because of a lack of proper 
death registration, the number of persons dying was estimated from 
figures of the number of coffins made. The figures of coffin production 
mounted, showing the development of an epidemic; but, after the 
epidemic was definitely known to have declined, the figures of coffins 
made remained high. Upon close inquiry it developed that the coffin 
manufacturers had continued to report peak production of coffins so that 
the agent of the health officials would not lose his job. They did not 
want to “break his rice bowl.” 

Sending schedules by mail rather than using enumerators is, at the 
outset. less expensive method of collecting data. Thcrej is also the 
added advantage that the person supplying the information can fill out 
the form at his convenience, instead of being disturbed by the enumerator 
perhaps at a busy or inconvenient time. Furthermore on a mail question- 
naire (provided, of course, that the informant is sure his identity is 
unknown), confidential information may be given which the informant 
wouKl hesitate to divulge to an enumerator. On the other hand, a large 
proportion of persons fail to reply to a mail iinpiiry and considerable 
follow-up work may be necessary. There is also great danger that the 
informant will not understand the questions, or vvill knowingly or other- 
wise make incorrect answers. Not only must clej.\ concise directions be 
sent with the schedule, but also a brief letter explaining the purpose of the 
inquiry and requesting cooperation, A modest gift (such as the coin 
sent by the Curtis Publishing Company) may insure a high proportion 
of returns. In any event, an addressed and stamped (or business reply) 
envelope should be included. An air mail business reply envelope (or 
card) is occasionally used by investigators with the hope that it will 
result in more and quicker responses. When follow-up work is nec^essary, 
the persons who have not yet returned their forms may be sent courteous 
personal letters reminding them of the inquiry and again requesting 
cooperation. When appropriate, tbe follow-up may l.o by moans of air 
mail letters, special delivery letters, registered letters (to be sure the com- 
munication has been delivered), telegrams, or telephone calls. Of course, 
the investigator should not make a nuisance of himself ; he should not be 

Carl Crow, Four Hundred Million Customers, Harper ami Brothers, New York, 
J9;{7, pp. 132-133. 

Ibid., pp. 252-'/. 
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too insistent. When only part of the schedules are finally received, it is 
necessary to examine the situation carefully to be sur(3 that no selective 
factor has been preKScnt. Or, if a seleciive factor appears to be present, 
it may be necessary to conduct a siipplernenUiry investigation to remedy 
the situation. 

5. Editing the schedules. After the lilleil-out schedules are 
received, a certain amount of preparatory work is necessary before tlu' 
data are in shape to be tabulated. The editorial tasks arc* \‘aried. In 
the case of a small study, one editor may do the entire woik. In a larger 
study, different phases of the editing may be portioned out among a 
number of editors. 

(a) Computing, It is usiially better not to ask enumerators or persons 
supplying information to make any computations. Thus, if information 
has been obtained concerning the number of rooms in a home and the 
number of members in the household, the editor may coinpule the ratio 
of persons per room, to give some idea of crowding. If data have been 
collected concerning the time lost through non-compensated ac’cidents 
and also of daily wages for each of a number of work(‘rs, tlu' editor may 
compute for each case the income lost because of accidoats. 

(b) Coding, Tabulation is frequently facilitated by f-oding. When 
machine tabulation (to be dis(‘ussed shortly) is used, all entrie,> on a 
schedule are reduced to a numerical code. Even when tabulation is 
manual, it may still be ea.sier to look for a code mark letters, numbers, 
or a combination of letters and numbers - instead of attf'inpting to read 
the original entry. The work of the tabulator may i>e fiirtlnn* facilitated 
by the fact that the editor writes, or should write, legibly and uses a 
distinctive color, often red. 

The unemployment schedule is shown ediled according to a numerical 
code on page 38. Every entry is numerically coded, exct‘pt those already 
expressed as numbers, in order to facilitate tabulation 1)}^ rnechani(\al 
means. Note that question 7 was self-co<led, A simple ccxle s(;home for 
questions 5 and () might appear as follows: 

10. Professional 

20. Clerical (not f>thnr\viso specified) 

30. Domestic and personal serv lep 

40. Governmeijl erh];io\ (othei than tcacluTs) 

Trade and Trantiparlalion 

50. Retail and wholesale trade 

51. Telephone and telegraph 

52. Railway, express, gas, ele»'tric light 

53. Water transportation 

54. Rank and brokeiagc 
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55. Insuram-e and real estate 

56. Other 


Manufacturing and Mechanical PursuUs 

60. Building trades, contractors 

61. Huildinj^ trades, wage earners 

62. (-day, glass, and stone products 

63. Food and kindred products 

61. Iron, steel, and their products 

65. Metal prodin'ts, other than jron and steel 

66. Paper, printing, and pul)]islung 

67. Wearing apparel and tcxtilCvS 

68. Automobiles, parts, and tires 

69. l/Unib(*-r and furniture 

70. Airjjlaiies 

71. Otlier manufacturing .ami mechanical pursuits 

75. I.abor (not oth(*rwise specified) 

oO. ^e'f-emplo> ed (other tlian 10 ui 60) 

90. Misceilaneous employments not classjtied above 

00. Not ieporte<l 

(c) Dcciphiriiuj. The liandNsriting of aj^ enumerator or of an inform- 
ant may o(‘c,asioiially be dilfieult t' read. This is especially true when 
•an enumerator imikes entries on a schedule while be is outdoors in the 
rain or snow. Deciph(*ring such copy is the editor’s task; he not only 
saves lime for the tabulator, but also insures ac^curate results. If entries 
are literally unreadable, the schedule may have to be referred back to the 
enumerator or the person who seiP in the infOin' ition. 

(d) Cheeking, The editor may look over tii schedules for incon- 
sistencies. Kntrie.-^ of age and date of birth may disagree. Something 
is probabl}" awry it an indu idual reported as aged 8 is also shown to be 
married. Similarly, a mistake has probably (though not necessarily) 
been made if a woman is n^ported working full time as a blacksmith. 
Such entries must be verified if tliey ar<‘ to be u.sed. 

« (e) Examining for cofnph f* nes.<. 'fhe edit()r must also scrutinize the 
schedule to see if any entri(*s are mis.sing or incomplete. If the missing 
information is important, the schedule mn.^t be referred back to the 
enumerator or to the informant. Otherwise, the edTor writes “N.R ” 
(not reported) or the corresponding., umerical code in place of the missing 
information. 

6. Organizing the data. After the schedules have been edited, the 
data must be organized before finished tables and charts can be made. 
They are three methods that may he used: 

(1) The score or tally sheet. For purposes of illustration, let us con- 
sider a score slui r to show, by industry, for male heads of households the 
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Area Household 




Card Enumerator ^ 


J . Helation to head of household 


2. Age 3. Sex 4, Rece'*'*^^ 


5. Regular employnsent . 
Occupation 
^ Induytry 


6. Present onployment: 
^ Oceupetion 
^ Industry 


7. Circle one number to indicate what this perstm was primarily doing during the week ending March 
20. 1954: 

Coi) Working for compensation in money or “kuid.” 

02 Self-employed 

Has a job or is self-employed, but not at work because 

03 On vacation 

04 Bad weather. 

05 Labor dispute. 

00 Layoff of 30 days or less 
07 Own sickness. 

OS Other . 

09 Not at work, new job to begin within 30 days 

10 Not at work, looking for work 

11 Casual worker, no regular job 

12 Attending school 

13 In the armfd forces. 

14 Keeping house (not as employee). 

15 Unpaid worker on family farm or in family busineas. 

16 Volunteer worker, not on f.unily farm or in family business. 

17 Retired. 

18 Physically or mentally unable to work 

19 Inmate of institution. 

20 Other . . 


8 If this person worked at all la^t week, for tvmipen.salion, or on family farm or in family business, oi 
as a self-employed perwn, how many hours did he or she work? hours. 


9. If this person was looking., for work, how many weeks has he or she been seeking eniiiloyment? 
weeks. 


Urbantown En»ployn.enl UnM/ipSoyntmt .'Uwdy, iSil 


Edited Urbantown Employment-Unemployment Schedule. 
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number of hours worked during the week ending March 20, 1954. The 
score sheet is shown on page 40 and represents the data from all of the 
edited cards for male heads of households from one area of the com- 
munity. The numerical eroding of the industry groups is not necessary 
for hand tabulation (which includes both scoring and hand sorting, 
described in the next subsection), but it saves space in the tally sheet to 
use the code numbers instead of the full industry designation. Numerical 
coding is necessary when mechanical tabulation is employed. 

Observe that the score marks arc arranged in groups of five, four verti- 
cal and a diagonal. This facilitates counting. The second set of score 
marks is for checking purposes. Since the tally sheet is for but one area, 
it is necessaiy to combine the results from a number of such tally sheets 
to arrive at the figures for the entire comn^^nity. The resulting table 
might appear as in Table 2.1. 

The score sheet is a useful device for organizing information from a 
small However, if there are many schedules to be scored or if it 

is desired to subdivide classifications, the score sheet becomes cumber- 
some. For example, if we wish to use the same categories of hours as 
shown on the score sheet but to show' also males and females and at the 
same time distinguish bet^veen ti^ )se who arc heads of households and 
those who are not, \vc might have two major categories ^‘hcad of house- 
hold^’ and ‘^not head of household.’^ Each of these w-ould be divided 
into and ^Temale,*^ and each of these four categories further sub- 

divided into the classes shown in the tally sheet on page 40. This w'ould 
call for 4 X G = 24 columns and would result in a very sizeable tally 
sheet. It could, of course, be broken dowm int.^ several score sheets, but 
it would be even better to use a different method of organizing the data. 

(2) Hand sorting. When a study does nor involve too large a number 
of schedules, and when the schedules are small enough and on card-board 
or heavy paper, so that they can he handled readily, the data may be 
organized by a process of manual sorting. If we wished to obtain the 
information mentioned in the preceding panigiaph, we might* (1) sort 
the* cards into four piles — male heads of households, female heads of 
households, male non-heads, and female non heads; (2) sort each of the 
four piles into the 27 industry categories, giving a maximum of 108 piles; 
and (3) sort each of these piles ‘ the hours-of-work categories shown 
on page 40. The cards in each pile w ould then be counted to obtain the 
desired figures. 

(3) Mechanical tabulation. Mechanical tabulation involves the same 
basic procedure as hand sorting, but it is much faster. Mechanical 
sorting and tabulating (counting and totaling) devices enable the wwk 
of organizing the information of a statistical study to be done most expe- 
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AflEA^.. 


SCORED 

CHECKED 


INDUSTRY AND HOURS WORKED 
MALE HEADS OF HOUSEHOLDS 



Industry and Hours Worked, Male Heads of Household. 


ditiously, provided, jf course, that the study is extensive enough to 
warrant the use of such equipment. The use of mechanical tabulating 
equipment is recommended when there is a large number of schedules to 
be analyzed or when there are numerous entries on each schedule. The 
process consists essentially of the following steps; 

(a) Transforming all entries on the schedule into numerical terms, 
using appropriate codes. 
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(b) Recording these entries on a punch card by punching holes to 
represent the code numbers. A card-punch machine is shown on 
page 43. 

(c) Sorting the cards and assembling the data by the use of machines,^® 

On page 44 there is shown a blank punch card and also an enlarged 
portion of a card, punched to represent the data of the edited schedule on 
page 38. The first entry on the card (103) identifies the area from which 
the schedule came. The next entry, using four columns, identifies the 
household and enables the cards for eAch household to be brought 
together, if desired. The following two columns indicate the number of 
the card within the household, siiice there may be several cards for a 
household. The first nine numbers taktui together make it possible to 
bring together any schedule and the punch card made from it, if desired. 
The next column shows by a that the individual was the head of a 
household; a ^* 2 ” would indicate that he was not a head. Age is shown 
in the two following columns. In the next column, ‘‘T’ indicates that 
the respondent was male; for a female, ‘“2^' is punched. The next 
column indicates race by these numbers: i, native white; native 
colored; foreign born; 4 , other; 0 , not reported. The industry code, 
which has already been given, occupies the next four columns, two 
columns for regular employment and two for present employment. Two 
more columns take care of the answers to the self-coding Question 7. 
Question 8 calls for a irumerical answer, which occupies the next two 
columns. The last three columns take care of the numerical answers to 
Question 9. Note that it was necessary to use only part of the punch 
card for this schedule. 

After the cards have been prepared, they are verified. This is accom- 
pli.shed by reading each punched card against the schedule represented 
by it. The cards are examined by placing them over a source of light 
or over a black background. Alternatively, a special machine called a 
** verifier” may he u.sod. The verifier resembles the card-punch machine, 
but it does not puin'.h the card.s. * * 

Following verification, the cards are sorted and tabulated by machine. 
The ‘‘electronic statistical machine,” .shown on page 43, performs both 
of these operations. In addition to sorting, it will count and total and 
then print the results, for as many as 60 classifications, on the two rolls 

The devices pictured here arc available from the International Business Machine 
Corporation, 590 Madison .\ve , New York City. Punched card equipment may 
also be had from Remington Hand, Inc., 315 Fourth Avc., New York City; Burroughs 
Corporation, Special Marhines Dep.^rtment, 219 Fourth Ave,, New York City; and 
Underwood Corporation, Sarnas Punched Card Division. One Park Ave., New 
York City. 
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IBM 4C«I 

A Punch Car<l. 


Relation to head 


Household 
Area 


Race 

Sex / Industry (regular) 
Industry (present) 

' Employment status 
/ Hours worked 
' y/ ^ Weeks seeking work. 



A Portion of a Punch Card, Showing How the Edited Schedule on 
Page 38 Would Be Recorded. 
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of paper shown at the rear of the machine. This machine will also check 
cards for consi.stcncy of information [See paragraph (d) under ‘‘Editing.”] 
based on pre-establi.shed criteria. 

A simple device, useful for small studies, is known as Keysort*® and 
employs cards having holes around the edges. Information is recorded 
notcliing away the portion of the card }>ctAveen the }»ole and the edge 
as shov n : 


Notclied and uiniot(*hcd cards are separated by means of a large sorting 
needle. 

7. Presentation and analysis. After the information on the 
schedules has been orp;anized by manual or mechani(‘al means, the 
finished statistical table.s and cliarts may be drawn up. Statistical tables 
are discussed in Chapter 3. Graphic presentation is considered in 
Ghapters 4, 5, and i\. 4'he analysis of statistical data is treated in 
Ghapters 7 through ‘id. 

I 8IN(; EXISTING SOIRCKS 

Primary versus secondary sources. As pointed out at the beginning 
of tills (diapter, statist i(*al data may alnauiy (^\ist which are suitable for 
use 111 a projected study. The data may or nr.y not liave been published. 
Th('y may Imva tieen collected iiy an indiviiluai a. business firm, a research 
organization, a trade association, a local, state, or federal government 
iiflice, a news])aper or magazin(\ and .so for^h. Some publications, such 
as the volumes of the I'uiUd Stales Census oj Population and Housing^ 
contain only data which were collected by the issuing organization. Such 
sources are designated as primary. Other publications bring together 
data some or all of Avliich were originally compiled by organizations other 
thap the one res])onsii)le for the publication, /rhese are referred to as 
secondary soiirco.s. The Survey of Current Hnsiness, published monthly by 
the Offi(‘e of Business Economics of the G. S. Department of Gommerce, is 
a secondary source, as it incliidt- data from many governmental and non- 
governmental sources. ()i)viously it is preferable to make use of a 
primrjT’y source whenever po.ssible, but it ma}^ often be more convenient 
to make use of a secondary source. On<* invaluable st'condary source of 
data is llie Statistical Abstract of the I'nitcd States,' \s^^uod annually by the 

Till* Kcy.sor^ issfild by the ^TcBoc Coniprmy, 295 Madison .\ve.. New York, N. Y 
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U. S. Bureau of the Census. A number of other sources which are avail- 
able in many libraries are listed in Appendix U. 

The reasons for preferring a primary source are: 

(1) The secondary source may contain mistakes due to errors in tran- 
scription made when the figures were copied from the primary source. 

(2) The primar}^ source frequently includes definitions of terms and 
units used. This is an iinportatit consideration, since intelligent use can 
hardly be made of data unless the user knows exactly what is meant by 
each term or unit rmiployed by the collecting agency. Wheri data are 
taken from several sources, it is particularly important that definitions 
of terms and units be scrutinized. The term family’’ may sometimes 
have the limited meaning of father, mother, and offspring; sometimes it 
may be used more or less synonymously with household.” The term 
'^exports” may sometimes refer to gross exports (including re-exports); 
sometimes, to exportvS of United States merchandise only. Although a 
measured bushel is 2 J 50.4 cubic inches, a bushel does not represent tlie 
same number of pounds for all commodities. For example, a bushel of 
green peanuts in the shell weighs 22 pounds, a bushel of oats weighs 32 
pounds, and a bushel of apples weighs 45 pounds; but a bushel of wheat, 
beans, peas, or potatoes weighs 00 pounds. The Statistical Abstract of 
the United States, although a secondary source, includes the necessary 
definitions of units. 

(3) The primary source often includes a copy of the schedule and a 
description of the procedure used in selecting the sample and in collecting 
the data; the reader is thus enabled to ascertain how much confidence 
may be reposed in the findings of the study. 

(4) A primary sour('e usually shows the data in greater detail. A 
secondary source often omits part of (he information or combines cate- 
gories, such as shov/ing counties instead of townships, or states instead 
of countiCvS. 

Suitability of data. The analyst should not make use of data, from 
either a primary or a secondary source, without assuring himself as to 
the reliability, accuracy, and applicability of the data. There are 
numerous points worthy of consideration here: 

(1) If the enumeration wa.s based on a sample, was the sample repre- 
sentative? 

(2) Was the schedule well designed? Were any leading questions or 
ambiguous questions included? 

(3) Was the collecting agency unbiased, or did it ^'have an axe to 
grind”? It is well to remember that bias may enter either consciously or 
unconsciously. 
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(4) Was a selective factor introduced because of careless enumeration? 
For example, in an unemployment study, canvassers might be careless 
about following up their calls at houses where no one was at home, and 
thus perhaps the data would show a smaller number of employed persons 
than actually existed. 

(5) Were the enumerators capable and properly trained? Jncom 
petent or poorly trained enumerators cannot be depended upon to pro- 
duce useful results. 

(6) Was the editing carefully and consciruitiously done? Careless 
coding or computing on the part of editors may render of little value the 
findings of an otherwise valuable study. 

(7) Was the tabulating (tally sheets, sorting, or mechanical tabula- 
tions) performed with care and accurately checked? 

(8) In view of the definitions used, the area -tudied, and the methods 
of procedure, are the data applicable to the problem that is under investi- 
gation*'’ 

It is not always possible to ascertain the quality of work w^hich was 
done by enumerators, editors, and tabulators. As just noted, primary 
sources are apt to reproduce a copy of the vschedule used and give a more 
or less adeciuate description of the methods and procedun^s followed. 
Additional information may frequently be had by correspondence. 

When using data over a period of years from a given source, we must be 
sure that definitions of terms have not c.hanged or, if they have chaiiged, 
to make due allowance for the change if it is possible to do so. For 
example, a new definition of the urban population w<as used for the 1950 
Census of Population. We shall not take the spi fc to give the old and 
new definitions-^^ in this text, but the object of the change was to include, 
as urban, more of the large and densely settled, unincorporated places, 
such as fringe areas around cities and unincorporated places of 2,500 or 
more inhabitants outside of an urban fringe. Data for 1950 nere 
tabulated on the basis of both the old and the new definitions and showed 
/in urban population of 88,927, 104 using the old definition and 90,407,086 
on the basis of the new dennition. For preceding censuses, data are avail- 
able only upon the basis of the old definition. 

Newspapers are not ordinarily good sources of statistical data, par- 
ticularly wheJi the figures are in a v nvs item. One reu.son for this is that 
newspaper copy is ])repared and printed so rapidly that the material 
cannot be as carefully proofread as can the contents of magazines and 

The new definition and the nature of the change are given in U. 8. Bureau of the 
Census, U. S. Census of Population’ 1960, Vol. IT, Characteristics of the Population, 
Part 1, U. S. Siuninary, pp. 9-10. 
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books. Ill addition, many figures quoted in news items are taken from 
speeches or statements from individuals who are themselves sources of 
dubious reliability. As an example consider this statement, made in a 
news item in one of the country’s leading newspapers: ‘^The estimated 
1952-53 (Australian) wool clip is 3,740,000 bales, the largest on record. 
C'Orapetent observers consider destruction of the rabbits (which ate grass 
intended for sheep) has added 25,000,000 bales to the clip.” There is 
no way of ascertaining, from the news item, whi(‘h figure is correct. 
However, the first figure is approximately right, the secomi figure being 
grossly incorrect. 

Comparability of data from diff<*rcnt sources. When data are to 
be drawn from two or more sources, the reliability of each source must be 
considered and, in addition, the user must be sure that the data from the 
difTerent sources are comparable. Let us list some of the reasons fur la(‘k 
of comparability: 

(1) T3ifTerent definitions of terms may have been used. Ck>al prodin*- 
tion is given by the United States Hureau of Mine's in ishort tons of 2,()0f) 
pounds, while at one time exports of coal were' shcnvii by the Bureau of 
Foreign and Domestic Commerce in long tons of 2,240 pounds. Short 
tons are now used by both bureaus. United States stocks of raw and 
refined sugar arc reported by the Department of Agriculture in short tons; 
Cuban stocks of raw sugar are given by the Wrekig Statistical Sugar Trade 
Journal in Spanish tons. A Spanish ton contains 2,271.64 English 
pounds. A.s if these thiee ;iorts of tons were not sufficiently confusing, it 
is necessary lo be aware of X\\ o other “ ions ” used in shipping. These are 
the yrtfss ton Knd the n^d (or registered; ton, ('ach of wliic'h represents 
100 cubic foot, tiro.ss tonnage i.s I he <‘apacity of the hull plu.s the enclosed 
spaces above deck available for cargo, stores, passengers, and crew; 
whereas net tonnage is the gross tonnage It'ss the .s];)a.(‘e occuf>ied by pro- 
pelling machinery, fuel, (T(‘w (piarters, mastfu’s (‘a})in, and navigatioti 
spaces in other words, approximately the spac'c a\ ailal>k» for cargo and 
passengers. 

Becau.se of difTereni accounting .sy^icins, the term "profit” may have 
different meanings in different itulnstries. Profit for a railroad may be 
(piite different from ])rofit for a department stun'. In a certain industry, 
carried on almost solely by ])artnerships, an investigator found that many 
firms showed little or no profit and tha4 great differences were present 
among firms. TIkj partners were freciuenlly paying themselves generous 
salaries, and therefore a new term, "profit plu.s partners’ salaries^” was 
used for the study’ Ages may }>e reported a.s of tlui last birthday; as of 
the nearest birthday; or, in Oriental fashion, as of the next birthday. 
Comparability of age data is thus affeeded by the bases of reporting. 
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(2) Different methods of computation or estimation may have been 
employed. For example, the methods of estimating population were 
responsible for two different inter-censal estimates of the July 1, 1935, 
population of Yonkers, N. Y. One organization announced the popula- 
tion to be 144,233 while another estimated it as 157,455. The lower 
estimate assumed that Yonkers had grown, since 11)30, the same per- 
centage as had the United H tales, the growth of the United States being 
determined by considering th*^' excess of births over deaths and figures of 
net immigration. Th(i second estimate appears to have been arrived at 
by assuming that the percentage change in the population of Yonkers 
from 1930 to 1935 was about one-half of the percentage change from 1920 
to 1930. 

(3) I'he samples may have been so chosen that the results are not com- 
parai)le. Or, perchance, one study may have been based on a sample 
whorciis the other was a complete enumeration. It is, of course, possible 
so to choose a sample that the results of a study may be forced to fit a 
precoiicei v eu idea. 

( I) Different standards of accuracy may have prevailed with respect 
to enumeration, editing, and t affiliating. 

(5) 3^he sources may not be comparable in respe(‘t to areas included, 
or in respect to the period of time to ’ hich they refer. When the chrono- 
loguial (Hiference is not too great, comparisons may sometimes be made or 
adjustments effected. 

Whether an investigator is using primary or secondary sources, it is 
necessary to keep on the look^.ut. for obvious mistakes and misprints. 
For example, a secondary sour(‘,e staiea that in Ct .uinental United States, 
in 1930, potential water power amounting to 38,110,000 horse power was 
available 90 per cent of the time, while potential water poAver of 9,166,000 
horst', power was available 50 per cent of the time. It is clear that there 
must be a greater poteutial horse power available for 50 per cent of the 
time than for 90 per (tent of the time. Data were given for each state and, 
^if these; details are added, it appears that 59,166,000 horse power of poten- 
tial v^xiter power were avnilabk; 50 per cent of the time. Obviously this 
was a typographical mistake whi(*h occurred in printing the publication, 
or pc/ssibly Avas earned ov(*r from the primary scAurco. Such an apparent 
eontradi(;tion would be observed at once by the experienced user of 
figures. 
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METHODS or PRESENTATION 

Four methods of statistical presentation are available. JJata may be 
(1) incorporated in a paragraph of text, (2) put into tabular form, (3) 
placed in a semi-tabular arrangement, or (4) expressed graphically. 

Text presentation. Combining figures and text is not a particularly 
effective device, since it is necessary to read, or at least scan, all of the 
paragraph before one can grasp the meaning of the ciitirc sot of figures. 
Most persons cannot easily comprehend the data when set forth in ,this 
manner, and it is especially difficult for the reader to single out individual 
figures. There is the advantage, however, that the writer can direct 
attention to,, and thus emphasize, certain figures and can also call atten- 
tion to cornpaiisons of importance. Following is an example of text 
presentation : 

The 1950 Census of Population of the United States enumerated 
665,149 males and 059,940 females in Colorado. This state, the most 
populous in the Mountain division, had 568,778 males and 554,518 
female inhabitants in 1 9*40. Next in population to Colorado, at the time 
of the Seventeenth (U)50) Census, was Arizona, whieh had 379,059 
males and 370,528 females At the 1940 enumeration, Arizonia had but 
258,170 males and 241,091 females, a smaller total f)Opulation than 
Utah, New Mexico, Montana, or Idaho. In 1950, Utah was third 
among the Mountain states, with 347,636 males and 341,226 females. 

At the time of the Sixteenth Census, Utah showed 278,620 males and 
271,690 females. Fourth, in 1950, was New Mexico, with its population 
consisting of 347.544 males and 333,643 females. At tlie proce<linft 
census, New Mexico bad 271,846 males and 259,972 females. Montana 
was next in population after New Mexico, in 1950, with 309,423 ni^ales 
and 281,601, females. In 1940 Montana's population consisted of 
299,009 males and 260,447 female-s, Idaho followed Montana with 
303,237 males and 285,400 females in 1950. A tlecnde earlier, Idaho 
had 276,579 males and 248, .394 females. Next to smallest in the 

50 
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division in regard to population was Wyoming, where 154,853 males 
and 135,676 females were enumerated in 1950. Ten years before, 
Wyoming had 135,055 males and 115,687 females. J^east populous of 
all the Mountain states w^as Nevada, with 85,017 males and 75,066 
females in 1950 and 61,341 males and 48,906 females in 1940. 

Tabular presentation. The same data that w'ere included in the 
preceding text statement are shown in TablevS 3. 1 and 3.3. This method 
of setting "forth statistical data is usually superior to the use of text. A 
table with its title should be fully self-explanatory, although it may fre- 
quentl}'' be accompanied by a paragraph of interpretation or a paragraph 
directing attention to important figures. 


TAlll.K 3.1 

Number of Inhabitants in the States of t)ie Mountain DivisioUf 
by Sex^ 19 to and 19S0 



: M: 

alu ! 

Fenialtj 






— , — 


1 tn50 

1910 1 

1950 i 

1940 

Colorado 


: 5BS,778 ! 

(1,50,040 

,5.51,518 

Arizona 


1 2r,s,i70 i 

370,528 

241,091 

[:tah .. 


278.620 1 

311 ,220 

271, GOO 

Now Moxi'‘o 

. I 347, 5 n ! 

: 271,840 ; 

3r«,(V!3 

! 250,972 

Mon Uuja . 

. i 309.423 ; 

! 299 000 1 

281,001 

1 260.447 

Tchiho , . . 

. . 303,237 1 

1 276,579 i 

285.400 

: 218,294 

Wyoming 

... 1 154.<S:>.3 1 

1 1. '(5,055 ! 

135.07G 1 

115,687 

Nevada . 

1 85,Oj7 

! ' 

7 5 , Odti ! 

48,906 

Datn from W S 

Bureau of the ( or»’'W' 

V. .S. Ce^suiy 

nf J'i'putditon: 

1&60, Vol I 

Cha-^acti rxHu n uj th* 

I'oyulnttoji, TaV/Jo in 

ll('‘ Bnit fur ear* 

■ '.<• itf. 



Tt is r€»adily seen that the table is much briefer (nan the text statement, 
since the row and <*olum7i h<?adings eliminate ^lic necessity of repeating 
explanatory matter. As no text appears with the figures, the presenta- 
tion is more concise. The logical arrangement of items in the stub (the 
left-hand column and its lioading) and box head (the headings of the oilier 
columns) makes a table clear and easy to read. The use of columns and 
rovv.^for the figures facilitates comparisons. 

In Table 3/2 the various parts of a table have been slightly^ separated 
and labeled for identification. A table will have at least the four essen- 
tials: title, stub, box head, and bedv. There may also oc present a prefa- 
tory note (see Table 12.2 or J2.3) and one or more footnotes, as in Table 
3.2. If the figures in the table are not original, a source note is also 
included, sometimes with the prefatory note but usually below the table 
and below the footnotes to the table, if any are present.^ 

Scmii-tabular presentation. When only a few figures arc to be used 
in a discussion, *^be text may be broken and the data listed as follows: 



52 


STATISTICAL TABLES 


[Chap. 3 


The number of pfissenger-miles flown per passenger fatalit}^ by 
scIie<lulod (lomostic air lines, was 

87,118,581 in 1950, 

79,111,993 in 1951, 

282,530,326 in 1952. 


TABLE 3.2 


Population and Area of the United States^ Territories^ Posses^ 
sions, and Other lioldingM^ 19S0 
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♦ For a li.st of tfv K^lfinds, banka, reefs, aiul ravs lueliideU in tfiis cateRory, >*00 the 
soiirre jtiven below. For «>ome island.^ the area was not availalde 

t Under juri.sdietion of the United Staten by treaty wdth the Republic of Panama. 

* l.eaj.ed fiorti the Republic of NicaraKua Popuhition data arc thoi-e of the 
Mfvv nUfO f'cn'^us of the Republic of Nit .'iraA^ua 

t I\cludfs cinrcris abioud on private bu^^iness, tra\el, and so forth many of 
whiun .vere eniimeraied at their usual place of roMdenfo. Population data e-^ti mated 
from a sample. 

** I.es-^ than one-hundredth of one per cent. 

'^liuroef Data from U H. flureau of the (/. S. (Vn<iW8 of Population JO.'/O, Vol T, 

\ Number of Inhabitants, 'Fable 1 of UniU-d States Suiniruiry • 


Foot- 

notes 


note 


This method is not often used, but it is serviceable in that the figures 
are made to stand out from the text as they would not do if worked into 
one or two sentences, [nrddentally, the figures can be more rt*a(lily 
eomfiared than if they were in the text. 

Graphic presc^iitation. (Jraphic devices arc extremely useful and 
effective for quickly presenting a limited amount of information, 'fhe 
thn^e following chapters deal witli curves, bar charts, maps, and other 
statistical diagram.s. 
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LEADING CONSIDERATIONS 

Types of tables. From the point of view of usage, there are two types 
of tables. In the first place there are general or reference tables, which 
are used as a repository of information. These are freipiently v(uy 
extensive, covering many pages, as, for example, United States Table 19 
in Population Volume I of the 1950 Census, which covers 13 pages. Such 
tables give detailed information arranged for ready reference. In a 
geneial table no attempt is made to arrange the entries so that emphasis 
will he placed on certain items, nor is there usually any reason for arrang- 
ing columns and rows in order to bring out comparisons dosiied by the 
investigator. The primary, and usually sole, purpose of a reference table 
is to pres(uit the data in such a manner that ii dividual items may be 
found roadil}^ l>y a reader. Referem^e or general tables are often placed 
in an appendix or a separate part of a published report.^ 

In the oi.» e‘nd place there are snmmnnj or iexi tables, which are usually 
relatively small in size and which are designed to set forth one finding or a 
few closely related findings as effectively as possible. While the reference 
table may be rather complicated, with subheadings and sub-subheadings 
in stub and caption, the summary ta*^'le should be relatively simple in 
construction. It frecpiently accompanies a text discussioJi and hcuico is 
also referred to as a text table. If a reader is expected to divert his atten- 
tion from a running discourse to a table, it is (‘.ssential that the table be 
not too formidable, but simple and easy to understand. Too many 
readers have a tendency to skip all the tables in a iCiiort, This tendency 
can be (‘ombatted successfully only by making ta!>lcs appear so simple 
as to be interestijig and by introducing graphs that are attractive and not 
unduly complicated. Because of the purpose wiiich a summary ta})le is 
to serve, the items shown therein Avill he arranged to plar^e emphasis 
Avhere desired, and the columuvs and rows will be so placed as to facilitate 
the comparisons of paramount importance. 

^ A summary table is almost invariably the result of boiling down infor- 
maticwi contained in one or more reference tables, although upon occasion 
a summary table may be based, in whole or in part, upon one or more 
other summary tables, KStill more rarely, a summary tai)le ina}" be loii- 
structed dircetl}^ from data contaiue * in schedule forms. The methods 
Avhieh can be used in deriving o?ie table from one or more others are: 

1. Data which are not important for the pioblem in hand may be 
omitted. Thus, although there are about twenty states which produce 

^ See, for example, 5 of the Annual Report of the Federal J^eposil Insurance 
Corporation for the Year Ended December 31, 1052. 
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sizeable amounts of bituminous coal, it might suffice to show separate 
data for only the ten or twelve leading states. 

2. Detailed data may be combined into groups. For example, data 
shown by states may be grouped into geographical divisions. Again, 
data shown by individual industries ma}' be combined into broader indus- 
trial groups. For example, the manufacture of brick, tile, aufl terra 
cotta products; of <’cment, glass, and pottery: and the quarrying of 
marble, granite, slate, and like products may be combined into the major 
category ^‘clay, stone, anrl glass products.'^ 

3. I'hc arrangement of data may bo altered. Thus an alphabetical 
arrangement of cities may be replaced by an arrangement according to 
size of municipality. 

4. Avei ages, ratios, percentages, or other computed measui es may be 
substituted or given in addition to. the oiiginal absolute figures. A 
column i)f percentages is shown in Table 3.5. It will be observed that 
these figure.s facilitate the interpretation of the data upon which they 
are based. 

Comparisons. While the arniug^rneiit into columns and rows 
makes it easy to compare the data, such treatimmi does not aiitoinatically 
focus attention upon the coinpan.son.s that are important. This may he 
eflected by placing the figures to he compared in contiguous column.s or 
rows. TliUs it may be seen tliat I’ahle 3.L facilitates the comparison of 
data obtained at the two <'onsu.se.*> for either males or females, while d'able 
3.3 makes it easy to <‘ompjire the number of males and iVunales emimorated 
at either census. 


TABl.K 3.3 

!\umbi*r ttf* Inhnhitanis In Slates t>J ihe \loftntnin Division* 

by Sex* t9i0 and 1930 

j 1U5U lt)U) 


State 

i 

1' b ' ' 

I’ email' 

Vlr.lo 


Colorado 

1 0(15 . 1 ly 


5nS,778 

551,518 

Arizona . ... 

i H7;(.05“ 

370,. 528 

2:>x.]7o 

241,001 

Vtiih 


:}H.22C 

27S.t;20 

271,000 

Xovv Mcxiccj 

:U7,r,-t4 

: lOW.tiOi 

i 27] .840 ' 

1 2.50,072 

Montana , . . 

. ' m 

; 2S1 ,C.01 

1 200 000 

i 2t;(M17 

Idaho . . 


; 285,400 

j 27(i,570 

i 248,201 

Wyoming . ... 

. ' ir,t,85:{ 

, I3.5,fi70 

j 13.5,0:55 

! 11.5.087 

Nevada 

. S'), 017 

, 75,Or.c> 

i 01,341 

18,000 


Data from U, of tln> < ‘musmi*, h'. »S'. (,V;i«a6 of J^ojnihtfioH Vol IJ, (Jkamr- 

tfriBiicA of ih^ PopuIntKfn, !'i lu tli(‘ l*ar+ for ftfc.'xt*: 


Each of the.se tables is well construeted, hut eaeli focu.ses n ( I eii tion upon 
a different comparison. One of the most important (‘onsideratioiis in 
table construction Is that figures which, are to bi* compared must be plai'cd 
in immediate juxtaposition. It .should la^ remembered tliat two or mon* 
series of figures are more easily compared when placed in adjaeiuit columns 
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than when placeu xn adjacent rows, and that figures of a series are more 
easily compared with each other when arranged in a column than when 
placed in a row. 

Comparisons may be greatly facilitated by the use of ratios, percent- 
ages, averages, or other computed relationships. Ratios are shown in 
Table 7.4; percentages, which are really a form of ratio (see Chapter 7), 


* TABLE 3.4 

Population and Area of the United States, Territories, Possessions, and 

Other Holdings, 1950 


1 

1 Population 

Gross area 

Region 

Number 

Per cent 
of total 

in square 
miles 

Total ... 

154,233,234 ! 

100.00 

3,628,130 

Continental Unitefl States 

150, 697,. 361 

97.71 

3,022,387 

Territo»-i<'s; 


Hawaii 

<^09,704 

0.32 

6,423 

Ala^ika 

128,613 

0 08 

686,400 

i 

Possessions: 



Puerto Rico 

2,210,703 

i 1 43 

3,435 

Guam 

59,498 1 

0 04 

206 

Virgin Islands of the U. S 

26,665 

0 02 

133 

American Samoa 

18,937 

0.01 

76 

Mjdway Islands 

416 


2 

Wake Island 

349 

** 

3 

Other islands* 

354 

** 

33 

Canal 25onct 

52,822 

0 03 

553 

Corn Islands# 

1,304 

* * 

4 

Trust Territory of the Pacific Islands . . 

54,843 

0.04 

8,476 

Population abroadj 

481,545 1 

1 0.31 



* For a Wat of the ifllands, banks, reefs, and oays included in this category, see the source given 
below. For some islands the area was not available. 

t Under jurusdiction of the United Statw by treaty v ith the Republic of Panama, 
if Leased from the Republic of Nicaragua. Pop\ila*^ion data arc those of the May 1950 census ol 
the Republic of Nicaragua. 

t Rxcluden citizens abroad on privote business, travel, and so forth, many of whom were enumer- 
ated at their uBu.al place of residence. Population data estimated from a sample. 

Less than one-hundrodth of one per x^ent. 

• Data from U. S. Bureau of the Census, U. S. Cfnaua of Popidation: 1950, Vxxl. 1, Number of Tnhahi- 
ianta, Table 1 of United State.s Summary. 


are included in Tables 3.4, 3.5, and 3.7. Ratios and percentages are par- 
ticularly useful when the absolut figures to be compared are large. 
Note that in Tables 3.4 and 3.5 rather large population figures can be 
compared readily by the use of percentages. When tables show monthly 
fluctuations and both maxima and minima are noted, as in Table 3.7, the 
additional entry “minimum as percentage of maximum” is useful for 
purposes of compai^n. Averages are shown in Tables 14.1, 14.3, and 
14.7. 
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Emphasis. The proper placing of an item in a table enables it to be 
given suitable emphasis. Since occidentals read from left to right and 
from top to bottom, it follows that the most prominent position in the 
stub is at the top, and in the box head the most prominent position is at 
the left; likewise, the position of least prominence is at the bottom of the 
stub and at the right of the box head. Notice that, by following this 
principle in Table 3.3, males were emphasized rather than females, and 
1950 was placed in a more prominent position than 1940. 


TABLE .3.5 

White Population and Forciiin-horn White Poptifejtion o/ the 
iWeiv England States^ 1950 

IVr {‘onl 
foroigii 
))orn 
15,5 ' 
!5 M 
1 1 0 
10.9 
8 2 
7 () 


I 

State 1 

White 

Forci^:n-))()ru 

white 

1 

population 

population 

Massachusetts 1 

4,t)il,503 

713,099 

Conneotieut 

1,952,329 

297,859 

Rhode Island 

1 777,015 

113,204 i 

Xew Hampshire 

1 532,275 

58,134 

Maine 

1 910,840 ! 

71,342 

Vermont 

1 .377,188 i 

28.753 


Data from U S Bureau of the ConsiH, V S. Ct naus of Vol. II, 

CharacUriHtics cf the Populatton, Part I. ITmted States Sximniary, p. I JOO. 


Totals arc generally placed in either the most prominent or the least 
prominent position, depending upon whether or not it is desired to give 
emphasis to them. When ‘Hotal’’ is shown at the top in the stub, a lino 
should be placed below the first row of figures, avS in Table 3.4. Jf the 
total entry is at the bottom of the stub, the figures are set off by a lino 
dra\sTi above them, as in Table 3,7. An alternative procedure consists 
of using a space instead of a line to set off the totals. Whatever its posi- 
tion, the word ^Hotar* in the stub should be indented if possible. 

Individual figures, or columns or rows of figures, may also be empha- 
sized by the use of boldface type, as in Table 3.6. When monthly 
fluctuations of employment, sales, or other factors are shown, the maxi- 
mum figure may be .set in boldface and the minimum may be put in italic 
type, as in Table 3.7. In general, italics are used to indicate an exception 
rather than for emphasis. Thus, in Tables 1 and 18 of Agricultural 
Statistics 1952, the figures in italics are census returns, whereas all other 
figures are compilations or estimates made by the Bureau of Agricultural 
EconomicvS. Italics are also sometime.s used to show dcfu*its, items to be 
subtracted in arriving at a total, and items to be omitted from a total. 

Arrangement of items in stub and caption. fSonsidoring the basic 
nature of statistical data which may be enc‘ountered, jt was noted (page 3) 
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that data may refer to geographical, chronological, qualitative, or quanti- 
tativ^e classifications. We are now interested in the methods which may 
be employed in arranging the items in the stub or the box head of a table. 
The method of arrangement will be determined partly by the nature of 

TABLE 3.6 

Analysis of Disbursements and lieroveries of the Federal Deposit Insurance 
Corporation in Transactions for Protection of Depositors and to Facili- 
tate Termination of i.itiuidations, 1934-1952 


(In tluiusands) 



'I'ransactions for the protection 
of depositors 

Tran.sac.- 
tions to 

Item 

Trdai 

(-120 

hanks) 

Rec.ei\ er- 
fd'ip 
case.s 

(2d5 

banks) 

Absf>rp- 
tiori cases 

(175 

banks) 

facilitate 
termina- 
tion of 
liquida- 
tions 

l>iabtir^<*iJHMil8 ... 

$.322,1 18 

$87,827 

$231,321 

$2,993 

Prill (Tifkil 

27(i,0U 

1 S7,0d4 

189,000 

2,716 

PftVotT oxp(*ns(>R (iionrocov('rahi(‘) 

7X3 

i 783 



LKiuidat ion expenses 

13, 20(3 


13,2()0 


.\(lviin<?es for asset protection 

32.055 

1 

32.055 

277 

Recoveries and income 

302, tt» 

73,21.H 

220,235 

3,789 

Principal reoov'ory to Dec 31. 19.72 

217,392 

72,8(3(3 

r/i, 52 (i 

1,601 

K^tiiTiated additional recovery of 





principal* 

1,020 


1,020 

1,005 

LKpudation expenses. 

13,2()f> 


i:i2(((> 


Advances 

32,055 


32,055 

277 

Interest ami allowable return (profit 
and income in termination trans- 





actions) , . 

8,715 

:o;r 

8,368*' 

816t 

iNet loss of funds . . 

10.700 

11,6 1 4 

.5,086 

-796** 

On prin<'ipal 

27.032 

n.i78 

1.3.4.54 

20 

PavolT expon.ses (nonrecowrabie) 

783 

1 83 

. 1 


boss: interest and income 

8 715 

3l7t ’ 

8,368* 

816t 


* Buck \alue of itMuamuin unlniuidiitctl a.s*>-et* reserve lv»r lo’j.-’os. The tv»tal for hotli 

lypi'h of tiansactiorih, 11 in Table 10 (of (he Report) as "Assets aequiied through 

bank fiusiu'nsions aiui alisorptiori'^ 

[■ Intere‘^t on f»ubrogated elnimi? 111 TjS of (he re, f'ivervl\iii ea-'^O'^ in . loch leccivers paid 100 per e^^nt 
*divi,donfls on eieditors’ claims, 

^ fftt ero,at on loans anfi allownblo n'tuin i>ii piu« huse i>rice in 01 ab‘<orption t in which collections 
exceeded the Corporation's disbursements and rccov*‘rable e\p”n,.e In b’i of the-^e ca'^es full interest 
01 allowable return Ava'* coJleeted ami cxce.ss collcctioici of Sl.olU.OOd rnturneil to the banks, 
t Profit plus net income (income on assets kiss lifpiidation expcn^e^l. 

** Excess of receipts. 

From Annual Report of the Federal Deposit Insurance Corporation for the year ended Deecrnbor 31, 
1952, p. 10. 


the data (whether basically geographical, chronological, qualitative, or 
cpiantitativc), and partly by a considerat ion of whether the data are to 
appear in a refereiu’e table or in a summary table. A number of different 
methods of arrangement may i)c employed. 
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AlphabeiicaL This method of arrangement is admirably adapted for 
use in a general table, because it enables individual items to be located 
with ease. It is, obviously, not a useful method for text tables. It can be 
used only with series which are classified geographically or qualitatively. 

Geographical, The geographical melhod of arrangement may l e 
employed for series classified geographically, but it is applicable only 
when an established usage has been set up and should be used only when 
the statistician is sure that his readers are familiar with the classification. 
The customary order of the geographic divisions of the United States and 


TABLK 3.7 

Piuniher of Meic Permanent 'Vo«-/«rm Dwellirtfi I'nitfi Started in I'rban and 
Rttral by Sourt'e nf Funds, January -December 1952 



Crnately fiimnceii 

Publicly finanreci 

Privately ami publicly 
finanoctl 

Month 


Rural 



Huraij 


Rural 



Crban 

non- 

Total 

Urban 

non - I Total 

Urban 

non- 

Total 



farm 



farm | 


fann 


JanuarN 

S2,s()ri 

or>0 

61.4(Xj 

3.3(K) 

2001 3.500 

'$r>,ioo 

K^,Sf n 


Fehninry 

39,700 

34,000 

74.300 

3.100 

3001 3.401) 

90oj 12,800 

42,800 

34,900 

77.700 

March . 

46,800 

4 4., 500 

01.100 

11.900 

58,500 

45.400 

103,900 

-April. 

50.400 

40,600 

97,000 

8,600 

oooj 9,200 

59.000 

47.200 

106.200 

May . . 

52,100 

48,600 

101,000 

8,300 

300i R6(K) 

60J()0 

48.1)(X) 

109.600 

June , 

49.000 

47.000 

96,9001 b,200 

4001 6.600 

50.100 

47.400 

103.r>(Hl 

July 

50,900 

50,200 

101,100 

1,500 

* ! 

52.400! .50.200 

102,600 

Auj$U8t. . . 

: 49,400 

' 48,000 

97,400 

h400 

30t5' 1.7(XI 

50,800| 48.300 
52,8no| 48 000 

09,100 

.Sopt43mbc' 

! 51 300 

I 47,9(i0 

I 

99.200 

1.500 

100 i 1,600 

100,800 

October - , . 

! 52.100 

47,100' 

99.200 

1,700 

1 

1 200* 1 ,900 

53 . 8 O 0 I 47,300 

101,100 

Novemoer .. *. 

' 12.300 

40,000 

82.300 

3.700! 100, 3,800 

-46,000| 10.100 

86. 100 

December 

; 36,800 

30,800 

67.600 

1 3.800i lOOi 3,900 

40.6001 30,900 

71.. 500 










Total 

554.0001 513.900 

1,(K18,500 

55.6(>() 

3.50bj 58. .500 

609,600! 017,400 

1,127 000 

Minimum ae pcrcent- 
affo of max 1 mum 

02 0 

_ 

60 7 

11 8 

. 1 n,7 

' .59 5 

t 67.4 

59 , 2 


♦ Fewer than 50 unitn. 

Data from Monthly Labor Review, May 1953, p. 589- 


of the various states may be seen in Table 6, of the United States sum- 
mary, in Volume I of the 1950 United iStates Census of Population, 
Although the Census makes frequent use of the geographical method of 
arrangement for the states, it almost invariably lists the (counties of a 
state alphabetically. For ease of reference, in a general table, the geo- 
graphical arrangement is hardly so satisfactory as the alphabetical. 
Although it may be argued that the geographical arrangement often 
places together contiguous, and therefore comparable, areas, it must be 
obvious that the geographical arrangement does not always do so. It is 
not usually a good method of arrangement for a summary table, since 
this arrangement does not place important items in prominent positions, 
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Magnitvde, A very satisfactory method of arranging items in a sum- 
mary table consists of listing them according to size, usually with the 
largest item first, but sorne1inu\s with the order reversed. The states 
shown in the stub of Table .'1.3 are given in order of magnitude. When 
the lavgest item is placed first, the most important items (numerically) 
are placed in the most prominent positions. Arrangement of items 
according to size is not useful in a general table because it does not 
facilitate tile finding of individual items as does the alphabetical arrange- 
ment. Data classified geographically or qualitatively may be arranged 
according to nnignilucle. So also may data classified chronologically, but 
they lose their (‘hronological seciucnce when arranged by magnitude, 

Iflstorical, Data classitied (ui a t^hronological basis would generally 
be arranged chronologu nlly or historically. When years arc listed, either 
the most rec<mt or the earliest date may be shown first. The months, 
however, are customarily listed with January first. W'hen the historical 
arra.iguAi-r > t is called for, it- may u^cd in either general or text tables. 
The historical lUTangemcnt is used in tlie stub of various tables in 
Dhapttu' 12. 

('ush)mari/. f.^ertain data tfiat are basically qualilative are generally 
arranged ac<'ordifig to custom/iry classes. Exports and imports are 
often gi'ouped into five (‘ategories- erndt* materials, crude foodstuffs, 
rnarrui’aclmed foodslulTs, semi-manufactures, and finished manufactures. 
The population of the United States, Avhen divided into groups upon a 
so-e, ailed “ raeo-nntivit y basi^. is usunlly sxibdivided into the following 
classes: native White, foreign-born White, Indian, Japanese, 

(Jnnese, and "‘all other. ‘These are ordinarily n-bd in the order given. 
When an “all other” group appears in a table, il )s ordinarily placed at 
the bottom in the stub, or at the right in the l^ox head. Good statistical 
practii‘o dic(at(\s tbat an “all other,” “miscellaneous,” or “ not reported ” 
group should include relatively small uuinbers; otherwise, the adequacy of 
the classification or th() accuracy of the ^‘ollectioii of the data may be ques- 
tioned. v\rraugemen( by customary classes is appropriate for either a 
text pr a reference table. Quantitative data may be arranged into classes 
as shown in the stub of Table 8.6. Such arrangements usually begin with 
the class of smallest numerical value and may be used in either a text or 
a refenauT table. 

Progressive. This method of arrangement is illustrated in the stub of 
Talile 3.6. Notice that the items arc listed in such a way that the final 
figure develops logically from those given Ixdore. Another example of 
the progressive arrangement was shown in the box head of a table which 
presented monthly data of the number of strikes in the linited States 
during a year. J"he progressive headings in the box head were: 
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Con- 
ti nucd Begin- 

from ning 
prcecd- j in 
ing I nu)nth 
month I 


In In 

prog- Ended effect 

ross in j at end 

during month i of 

month I month 


The progressive arrMigoment is suitable for cither text or reference tables. 

Numerical. The wards of cities are usually'’ designated as Ward 1, 
Ward 2, and so forth. When data for siicli subdivisions are sliown, a 
numerit^al arrangeinent is generally followed. The precincts and districts 
of counties are xsometimos niunbered; the departments of a fat^tory and 
salesmen's territories or sales areas may also bf' identified by numerical 
designations. This method may appear in eitlnu’ a text or a reference 
table. I'he numbers assigned to ilie categories are frequently only labels 
serving to identify some underlying arrangement. For example, in a 
shoe factory, Department 1 was the cutting department; Department 2, 
the fitting department: Department 3, the lasting department; and so 
forth. 


In using the various methods of arrangement, remember (Fat in a 
reference table, the items should Ix^ arrang(Hl for gr(‘atest case of reference, 
whereas in a text table the arrangement should bo designed to emphasize 
the important items and to stress the prop( 3 r comparisons. 


DETAILS OF TABf.E CONSTRLCTION 

Title and identification. A title should accompany every table and 
is customarily placed above the table. The title sliould be clearly worded 
and should state briefly what data are shown in the table. A title should 
be so worded as to Tneiition the more important considerations first, 
placing toward the end any stat(mu*nt concerning how the items are 
arranged and what period of time is (Siven'd. In general the states, 
in order: what, where, how classified, and when. Illustrations of titles 
are shown in the various tables (;f this cha])ter. It will he noted that,^ 
when a title neeessitiitos the use of several lines, an invort(‘(l-pyrainid 
arrangememt is used. i 

If a title is long, it may be advantageous to place a ‘'catch ti|Ie’^ above 
the main title or, occasionally, to .substitute the catch title for the full 
title. This shorter title undertakes merely to state the general nature of 
the data in the table. For 'J'able 3.7 a catch title might read “New 
Dwelling Units in 1952." 

When more than one table is included in a study, it is desirable to num- 
ber the tables consecutively in order that each one may be identified by 
number rather than by title. 
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Prefatory note and footnotes, A prefatory note, one or more foot- 
notes, and a source note may be appended to a table. A prefatory note is 
placed just below the title and in smaller or less prominent type. The 
prefatory note provides an explanation concerning the entire table or a 
substantial part of it, as in Table 3.6. 

Explanations concerning individual figures, or a column or row of 
figures, should be given in footnotes. Footnotes keyed to stub entries 
and column headings nia}^ be referred to by means of numbers, however, 
footnotes keyed to figures should be identified by a symbol (*, f, J, 
etc.), as in Table 3.f), or by a letter, but preferalfiy not by a number. In 
this book, symbols have l>eeu used for keying footnotes to figures, stub 
entries, column headings, and titles of tables. 

Source notch. As previously indicated, the source note may appear 
below the title or below the footnotes. The latter practice has been gen- 
erally followed in this text. The data set forth in a table will not often 
which the invTjstigator has collected. Usually tlie figures will 
have beei\ taken from one or more published or unp\iblished sources. 
The source note should be complete, giving author, tide, volume, page, 
publisher, and date. Not only is it courteous to mention the source of 
data c|iioted, but such information gives the reader some idea of the 
reliability of the data and makes it possible for him to refer to the original 
source to verify quoted figures or to obtain additiiinal information. 

vSomctinies data are taken from a secondary source instead of a primary 
sour(;e because the secondary souire may be more coin onient. In such 
a ense it may be advisable, to mention both sources; for example, ‘‘Source: 
National Board of Fire ('nderwritcjrs as quoted .m Statistical Abstract of 
the Lhiited States, 1953, p. 170.'^ 

Data for a table may sometim(\s b(‘ taken from two or mon' different 
sources. When this is done, (*are must be exerc^ised to see that the data 
are comparable. The importancej of comparability of data was di.sciissed 
in Chapter 2; it is not necessary to say more on that topic at this point. 

When apj)arent mistakes are found in a source, it is well to call atten- 
•don to the fact. The Decembm- 1935 Monthly Labor Review (p, 1503) 
reprints a table from The Oriental Economist showing that total payrolls 
in 10 industries in Japan in 1933 wore 047.340,199 yen, but points out in 
a footnote that, if the figures given for (‘ach of the 10 industries arc added, 
the result is 647,430,199 yen. 

Preeeiitagcs. When percentages are used in a table, the stub or the 
caption entry should indicate cleaidy to what figures the* percentages 
relate. Thus, the term “per cent” alone sliould l)C avoided; rather say, 
“per cent of total, “ ‘^per cent of increase or decrease, “ and so forth. 
Sometimes tabh^' arc divided into a “ number “ section (showing absolute 
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figures) and a ‘'per cent'’ section, as in Table 7.5. This table and Table 
7.2 illustrate the use of adequate headings referring to percentages. 

The percentages in the last column of Table 8.6 total 99.9, while those 
in the column just to the left of that one total 99.8. When individual 
percentages are written corret^t to tenths of one per cent, as i.s customary, 
the total will occasionally be slightly over or below 100.0 because of the 
accumulation of positive or negative remainders when rounding. If the 
percentages had been entered in hundredths or thousandths di a per cent, 
the total would have been closer to 100.0. Although a “per cent of 
total" column may add to slightly more or less than 100.0, the total is 
shown as 100.0, since that is what the individual percentages would yield 
if carried out far enough. If a total adds to less than 99.8 or more than 
100.2, it is advisable to re-check the calculations for mistakes. 

Rounding numbers. In order to avoid confusion and to facilitate' 
comparisons, numbers of many digits may he rounded. Numbers may 
also he rounded because the compiler feels that they are accurate, not to 
the final digit, but only in terms of (say) thousands or millions. 'I'lic 
figures shown in Table 3.7 were rounded (but iio digits dropp^Kl!) pre- 
sumably to call attention to the fact that they wore estimates. 

When numbers are rounded, a statement to that (dfecd. sliould l)e made 
in a prefatory not(5 or in the stub or the box head l^ho wordiiig may l)e 
"millions of . . . "000,000 omitted," and like expre.>sions. Tables 

3.6, 7.1, and 7.2 contain rounded rnimbers, and mention of that fact is 
made in a prefatory note 6r in the appropriate box head. 

If a series .of figures is to bo expres.sc.d in thousands of flolUirs, for 
example, the rounding is to the iiearr^l thousand, 'Thus $2.0-18,302 would 
become $2,648 f thousand) and $7,220,782 would l)Ccome $7,227 (thou- 
sand). If the heading "thousands of dollars'’ appears in the box head 
(or stub) of a table or as a prefatory note, the dollar mark is not needed. 

No serious error is ordinarily introduced by rounding. If oat*h of a 
series of numbers is rounder!, some will be raised and some will be lo^wred, 
but the errors so introduced tend to ofTset each other. Furtlif^rmore, \l 
may be felt that to show all the digits of a large nunuber is to giy(*. the 
appearance of spurious accuracy. For example, the population of the 
United States was asc'^tained to be 1«50,697,361 persons in 1 (¥>(), but the 
figure could hardly be accurate to units or oven to hundnuis. However, 
it may be maintained tlnat the figure lot), 697, 361 is the one obtained by 
the best methods available and is therefore probably more accurate than 
any rounded figure. IrrespecUve of the merits of tliese two points of 
view, six (or fewer) significant figures may often be acrnirate enough for 
the comparisons desired. Further mention of rounding (and of significant 
digits) is made on pages 139- 140 end in Appendix T. 
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When computed values, such as totals, percentages, and averages, are 
to be shown in tables of rounded figures, these values should, if possible, 
be calculated from the original figures before rounding. 

Totals. We have previously noted that totals, when of major impor- 
tance, may be placed at the top in the stub and at the left in the caption. 
When it is not desired to emphasize totals, they may be placed at the bot- 
tom in the stub and at the right in the caption. 

Table 3.7 carries both total columns and a total row. An arrangement 
such as this results in a single number (1,127,000) which is sometimes 
termed a “grand total” or a “checked grand total.” The fact that the 
figures yield the same sum when added vertically and horizontally is not a 
positive check, since two or more compensating errors may have been 
made. That, however, docs not often happe*-. We do have definite 
proof cither that no errors were made or that more than one was made. 

Units. The units of m(‘nsurement of the figures in a column or a row 
of a Ubi'* :n:iy often be selLexpIanalory. When this is not true, the 
nature of the unit should be made elear in the stub or the box head, as in 
Table 7.1. If the explanation applies to all figures in the table, it may 
appear as a prefatory note. Data of monetary units are usually self- 
descriptive, because of the use of tiu' dollar sign. Note, in Table 3.6, 
that this sign appears for only the first entry in a column. 

Size and shape of table, fn general, a table should be designed so 
that it will be neither very long and narrow nor very short and wide. A 
table must also be adjusted to the space in which it is to appear. Usually 
this limitation takes the form of a page of a boo*: or a report. Of course, 
a table need not oct'upy the entire length or width J a page. If the table 
is too large for tlie allotted space, it may l^e recast into several smaller 
tal)les. Reductif»n of type size may permit, a table to be included on a 
page, but reduction should not be made at the expense of legibility. If 
the use of a folded page is not de.sirable, tlie table may be arranged to 
occupy two facing pages. Because of the difficulty of aligning pages 
perfectly in binding, the stub is often repeated on the second page. 
When reference tables are continued over several pages, they may be split 
either vertically or horizontall 3 n In cither case, complete stub and 
caption entries should appear on each page, the title should bo repeated 
on eaeh page, and footnotes may ap^ ar at the bottom or the appropriate 
page or may be accumulated at the end of the table. 

T'he horizontal dimension of a table may be determined by allowing for: 

(1) Width of stub, determined by longest entry. (A very long entry 
may be put on two or more lines to save space; see the last item in the 
stub of Table 3.7 ^ 

(2) Width of each column, determined by largest number or by entry 
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in each box head. fBy hyphenating words, an entry in a box head may 
be compressed horiisontally and expanded vertically.) 

(3) Ruling. 

(4) Margins. 

The vertical dimension may be ascertained by considering: 

(1) Space needed for title, prefatory note, footnotes, and source note. 
Sinc’C the first line of the title should not exceed the table in width, a long 
title may require se^'eral lines. 

(2) Nximher of lines needed for the heading, in the stub or box head, 
which requires the most verth^al space. 

(3) Number of rows in body of table. 

(4) Ruling. 

(5) Margins. 

Ruling. Mo.st of the tables in this text are shown with single-line 
ruling and are open at the sides. Ooiibie-line ruling is sometimes used, 
but double liries seem to make either hand-ruled or printed tables appear 
somewhat complicated. Tables are rarely c!o.s(kI at the sides, and sliould 
never appear with one side clos(‘d and oiu* open. 

'J'here seems to be a growing tendency to use text tables without luliug, 
either vertical or horizontal. Table 3.8 shows how 'Fable 3 5 might 
appear when no ruling is used. 

' TAKLK 3.8 

F'opuUition and Fnrtntin^bnrn ff iiite l^opnlatiftn nf the 
j\ete Enfilaritl Stales^ 1950 



White 

/’ Of n 

Per enit 

State 

population 

irhiU 

popuintinn 

foreiyn 
hof n 

Massachusett:'! 

4, till, 503 

713,699 

15.5 

Connf*cii("iit 

1 .952,:?29 

297,859 

15 

Rhode Ldand 

777 ,015 

1 13.264 

14 6 

New I jainp.'^hire 

5::i2.275 

58,131 

10 9 

Mairte 

910,846 

74,3 4? 

8 2 

Vorinont 

377,188 

28,r.5:J 

7 (» 

Data from S, Bureau of the (' 

♦•n.N’n. n S. CruHUS of I^ofmlati'tn 

Vol. 


II, Chat artrrii'f ICS of the Foi)ulat\on, Cwrt I, Stat»*M Simuuaty, i>. ! in(i. 

An examination of table.s in thi.s book and (‘Isewhere will show that: 

(1) No horizontal lines are used in the body of a tabic ex(*cpt to set off 
totals and occasionally to separate a table into distim't parhs. 

(2) Horizontal lines separating major and minor box heads do not con- 
tinue into the stub heading. 

(3j All vertical lines separating box heads appear only between the box 
heads which they separate; they do not extend above th(*.se box heads. 
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Guiding the eye. Skipping a line every three, four, or five rows, as in 
Table 3.7, makes it easier for the eye to follow the rows across a table. 
The use of leaders in the stub of a table is also helpful. 

Zeros. It is not customary to show a zero in a table (other than a 
computation form). When no oases have been found to exist or when the 
value of an item is zero, the fact may be indicated by means of dots (. . .) 
or short dashes( — ). When there is no figure for an entry because infor- 
mation is Ifxjking, a footnote should be u.sed to indicate that fact. 

Size and style of type. Too much variety in size or style of type (or 
lettering) is not desirable. In general the title should be most prominent 
aT)d is usually set in large and small capitals or in boldface tyj)e. The 
items listed in the stub and ('apt ion and the figures in the )>ody of the table 
are usually set in the same siz(^ type. Footnotes, prefatory note, and 
soiir(^e note are generally set ii‘ smaller type than that used in the body of 
the table. 


, STATLS riC A L REPORTS 

When making a statistical report, the method of preparing the tables 
will be di(d,ated partly by the Jiumber of copies of the report re(iiiired and 
partly b}' tlie cost involved, d^ahles may he handwritten, typewritten, 
rnimeograplied, multigraphed, reproduced by a photostatic or photo- 
graphic pro(;ess from handwritten or typed table.s, or printed. 

There is a distinct disadvantage in the use of the ordinary typewriter 
for preparing other than relatively simple tables, because of the lack of 
flexibility of spacing and of size of typo.- Table 3.0 sluws a table without 
ruling, pn'pared on an onlinary typewriter with pica type. Table 3.10 
presents the same data and indicates how ruling may be done on a type- 
writer. N ote that more flexibility was obtained by using two typewriters, 
one with pica and cme with elite type. By using elite type for the stub 
entries and the body, a (*ertain amount of space may be saved. Some- 
what more (lexibilibv in planning a table may be bad using a typewriter 
with variable spacing and with different kinds and sizes of type. 

• If only a few copies of a report are required and if the tables are simple, 
the tables and accompanying text may be tyj)ed and carbon copies made. 
If several dozen copies are needed, the longhand or typed material may 
be photostated at a cost of about 2.^ cents per X 1 1-inoh page. By 
this method, reduction or enlarging is possible and copies may be had 
rather promptly, since no plate need be made. If a larger number of 
copies is required, resort may be had to mimeographing or multigraphing. 
Tables may also be reproduced by a photo-offset process, which is quite 
satisfactory and is often cheaper than printing because tj^pesetting is 
avoided. Enlargii.g or reduction is possiVde; typed material may be 
reduced so that 4 ordinary 8^ X 1 1 -inch pages (pica type) will appear on 
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one page. It should be noted that the typed copy should be a first-class 
job if satisfactory reproductions are to be obtained. 

Occasionally the gelatin-pan method may be useful when only a few 
copies are needed. A special ink is available for handwritten material 

TABLE 3.9 

SUKBER OF INHABITANTS IN THE STATES OF THE MOUNfAlN DIVISION, 


BY SE-X, 191*0 AND 1950 

Feinal^ 

1950 19*»0 195Q 1940 

Colomdo 665.149 566,778 659.940 554,518 

Arizona 379,059 258,170 370,528 241.091 

Utah 347,636 27B.620 341, 2?6 271,690 

New kexlco 347.5^ 271,846 333.643 259,972 

Montana 309,423 299,009 281,601 260,447 

Idaho 303,237 276,579 285,400 248,294 

Wyoeiln* 154,853 135.055 135.676 115.687 

Nevada 85,017 61,341 75,066 48.906 


Data froffl U.S, Bureau of the Census, l/.S. Census ^ £aj|5i* 
l atlpn t 1950 . Vol. 11, Characteristic^ <2X roDTOir lofl. 7a* 
bia 13 Intne Perl for each stale. 

TABLE 3.10 

WJMBEfl 0? INHASITAHTS IN THE STATES OF THE MOUNTAIN 
DIVISION, BY SEX, 1940 AND 1950 
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Graphic Presentation I; 

CURVES USING ARITHMETIC SCALES 


THE GRAPHIC METHOD 

Aiteation has already been given to the presentation of Kstatisiical data 
by means of text, tabular, and semi-tabular devices. Ordinarily, sta- 
tistical data will be presented in the form of either a table or a chart. 
This^ chapter and the two wdiich follow arc devoted to a discussion of the 
portrayal of statistical data by graphic devices. As will be readily seen 
from a perusal of the pages of this book, charts or graphs are more effec- 
tive in attracting attention than are any of the other methods of present- 
ing data. Readers are therefore not so likely to sk p a chart as to skip a 
table. A simple, attractive, w’ell-constructed graph, showing a limited 
set of facts, is also easier to understand than is a tabic. 

The outstanding effectiveness of a chart for presenting a limited 
amount of data makes it a most useful statistical tool. Certain limita- 
tions should be noted, however. In the first place, charts cannot show 
so many sets of facts as may be shown in a table. Numerous columns 
and rows may appear in a table; but imagine Chart 4.2 with six or eight 
criss-crossing and intertwining lines, and it is immediately obvious why a 
chart should show only a limited amount of information. In the 
second place, although exact values can be given in a table, only approxi- 
mate values can ordinarily be shown by a chart. In a table we may 
enter as many digits as are desired, but we can plot only the approximate 
value on a chart. For example, while the data upon which Chart 4,2 is 
based could be recorded in a table in terms of bales, a chart could show 
only thousands, or at best hundred-s, of bales. Thus charts are useful for 
giving a quick picture of a general situation, but not of details. In the 
third place, charts require a certain amount of time to construct, since 
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each one is an original drawing. This difficulty, however, is offset by the 
added effectiveness which the chart possesses in comparison with a table. ‘ 

TYPES OF CHARTS 

In this text we shall discuss: curves or line diagrams; bar charts, involving 
one-dimensional comparisons; area diagrams, involving two-dimensional 
comparisons (including particularly -pie diagrams, which invoK'e one- or 


> 

+ 5i 

+4 - 

Quadrant II 

P2 r *3- 

1 

42- 

: +1 * 

r 

Quadrant 1 

P 



-5 -4 -3 -2 -1 

: - 1 

+1 +2 +3 +4 45 

: -2- 

• Rt 

P3 i 3- 


Quadrant III 

Quadrant IV* 

-4- 

- 

-5- 



Chari 4.1, Axes for lairve 

two-dimensional comparisons, or comparisons of angle's) : volume (liafjnimHy 
which (.‘all for a visualization of the third dimension and three-dim(!nsional 
comparisons; pictogrophs, which involve aspects of both volume diagrams 
and bar charts; and statistical maps. Other specialized types of chat Is 
and certain charts which arc graphic but not statistical (for example, 
organization and pn^c^'diire charts) arc not treated here, bnl are discussed 

* William Playfair, wlio is uufliTstood to have “iiivoiiU'd outright thp graphic 
method in the latter part of tlu* hSth century, says: ‘^The arl vantage proposo<l hy this 
method, ia not that of giving a more arcuratc statement than by ligures, but it is to 
give a more simple and permanent idea of the gradual prf»gresK and comparative 
amounts, at different periods, by presenting to tin- eye a ligure [cliart], the proportions 
of which correspond with the amount of the sums intended to bo expressed.” Sec 
the article “Playfair and His Charts,” by ff. Cray Funkhauser and Helen M. Walker, 
in Economic History, p'<d)ruary 1935. pp 103-109. 
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iu books on graphic methods. This chapter will consider only curves 
using arithmetic scales. In the following chapter attention will be given 
to curves using a logarithmic vertical scale and an arithmetic horizontal 
scale. Chapter 6 will include brief discussions of bar charts, area dia- 
grams, volume diagrams, pictographs, and statistical maps. 

. PLOTTING A CURVE 

When statistical data arc shown as curves, the points are plotted in 
reference to a pair of intersecting lines, called axfs and shown in Chart 

MJLLIONS 
OF BALES 



Chart 4.2. Produclicm of Colton in the United Stales. 1925-19.5.3. Data 
from U. S. Department of Agrieulture, ApricuUnral Statistics, ^ p. 70, and 
p. 04, and The Cotton Situationy issued by x e Agricultural Marketing Service of the 
IJ. S. Department of Agriculture, May 27, 1954, p. 18. 

4.1. The horizontal line is known as the ‘UY-axis and the vertical line is 
designated as the “F-axis.^^ Positive values are shown to the right of 
zero on the Y-axis and above the zero on the F-axis; negative values are 
placed to the left i t zero on the Y'-axis and below the zero on the F-axis. 
The point at which the two axes intersect is zero for both Y and F and 
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is referred to as the ^'zero point," the point of origin," or merely the 
“origin." The positive and negative values on the axes increase as we 
move away from this origin. 

The two axes of Chart 4,1 divide the plotting area into four sections 
known as “quadrants." For reference purposes, these quadrants are 
designated I, II, III, and IV. Quadrant I accommodates values which 
are positive on both the X- and K-axes. Quadrant II provides for values 

NUMBER Of 
OiTOMETRIBTS 



Chart 4.3. Net Income of 1,764 OptoniciriBls in 1951. Data from the 
Amprican Optometric Association. The frequencies for the last three plotted classes 
are estimates. 


which are negative on the Jt-axis and positive on the F-axis. Quadrant 
III takes care of values which are negative on both axes. Quadrant IV 
is for values which are positive on the Z-axis and negative on the F-axis. 

Any point plotted in one of the quadrants may be located by referring 
to its abscissa value, which is its horizontal or X distance from zero, and 
to its ordinate value, which is its vertical or F distance from zero. For 
illustrative purposes four points have been plotted on Chart 4.1, one in 
each quadrant: Pi represents X ^ +4, F = +2; Pi indicates X —3, 
F - +3; P^ is X - «4, F « ~3; P 4 shows X - +3, F - -2. 
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When the axes are used as bases of reference for plotting equations, any 
or all of the quadrants may be used, since many equations may call for 
negative values of X or of 7, or of both. At present, however, we are not 
interested in the graphic representation of equations, but in graphically 
portraying observed statistical data. When we are dealing with sta- 
tistical data, it must be obvious that both the X and Y variables are 
ordinarily pasitive quantities, and that therefore we shall generally use 
only the quadrant designated as 1. Chart 4.2, shelving the production 
of cotton in the United States over a period of years, is an example of a 
curve lying wholly in quadrant I 

Quadrants II and IV are occasionally used in conjunction with quad- 
rant I. (^ha-rt 4.3 shows a curve wlin*h makes ^’se of quadrants I and II: 
the curve of Chart 4. 1 lies partly in quadrant J and partly in quadrant IV. 
Since both X and Y values are negative in quadrant III, that quadrant 
is x:cW used. 

TYPES OF DATA SHOWN BY CURVES 

It was noted earlier that statistical data may be (4assiiied according 
to chronological, geographical, quantitative, or qualitative characteristics. 
Curves are frequently u.^ed for picturing time series and for showing fre- 
quoncy distributions (b}'' lar the most important sort of quantitatively 
classified data), although, of course, other types of graphs are also appli- 
cable as shown in the following chapters Qualitatively and, especially, 
geographically classified data are rarely depicte*^ by curves; instead, bar 
charts and other devices are used, as will be indi«M^ed hereafter. 

Time series curves- The method of plotting time scries depends 
upon the type of data to be represented. We may distinguish between 
period data and point data. Period data, such as total sales per month, 
average monthly sales per year, and a\Trage prices during the year, refer 
to a period of time. Point data are those, such as inventory values, price 
quotations,. or temperature readings, which refer to a particular point of 
time. Whenever chronological data are depicted by means of a curve, 
the years, months, \veeks, days, or other chronological units are shown on 
the horizontal axis; the other sericws, which varies with time, is placed on 
the vertical axis. 

Charts 4.2 and 4.22 show period data. W’hen annual data of this type 
are plotted, the dates on the horizontal scales may be placed below' the 
vertical lines, as in Chart 4.2, or V>elow spaces, as in the left-hand part of 
Chart 4.22. Either method may be used; one argument for labeling the 
spaces is that this gives a visual impression of time as having duration. 
When monthly (i ud daily, weekly, or quarterly) data are plotted for a 
number of years, there is no choice but to label the spaces represent ng 
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each year, since, if the lines were labeled, it would not be immediate!}'' 
obvious to all readers wliether the label referred to the spa(‘e preceding 
the line, the space following the line, or possibly half of the spa(!e on 
each side. Each horizontal year-space is divided into 12 parts for the 
plotting of the monthly figures, and these figurcvS may be plotted at the 
middle of each of the 12 spaces. C-hart 1. 1 illustrates this for period data 
on a monthly basis. 


PcCE&* or AMmMLS 
cvm OCPAKTUMfS 
IM THOUSAN09 



Chart 4.4. Nt-I Arri^alis and Departures ot* I nited Stales (.itizens. January 
1947 'December 1952. Data frurn S, i)epa? trn«'nt cf Cuininort'O, Oflit-e of Husiiiess 
Economics,, Staif.^ti( .s, 1951. p. Ill, and 1955, p. IIS. January 

1951, all travel 0\'(’r inter natuaial land borders wa.i excluded from ilje liKurey of 
arrivals and departures'; see note! 4 on page 246 of Buaine'iis StntiHtira, 1955. 


When point data are being represen led by a curve, spaces, rather than 
li/jes, should be laljelr*d on the horizontal axis and the (observations should 
be plotted within the spacers at thc^ point in time to which the data refer. 
This latter consider alion is more important for annual data than for 
monthly data. Hmvevca-, for monthly data we should, ideally, (1) plot 
beginning-of-the-month data (such as figures of cold-storage holdings as 
of the first of each month) at the beginning of each space representing a 
month, (2) plot middl<‘-of-the-rnonth data (for oxami)le, payroll data for 
tlie payroll nearest the fifteenth of ea(*h month) at the middles of ea(‘.h 
space, and (3) plot end-of-lhe-iaonth data (such as money in circulation 
at the end of each month) at the end of <*ach space. This is illustrated 
in the three parts of Ghart 4.5. If tliis procedure is not followed, the 
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appearance of a curve of monthly data is not altered; the curve is merely 
shifted to the left or to the right. 



A. BoKinriing-of-thc- 
month (lutii 



inonth (Inta- 



C. Jh n fl - o f - 1 h p - 
irionth data. 


Chart 4.5. Methods <»f Plolling Monthly Point Data. J'ach sniall chart 
n‘])rcscnts the twelve months of a year. 


Curves of frecfueiicy distrihulioiis. Tlie curvo of Chart 4.3 is a 
gr-.p:..., .<'i)n\sentatioii of a frefpicncy distribution. Frofpioncy distribu- 
tions will not usually mntinuo into the seeond (luatlrant as does this one. 
In this Instanee, however, there were some negative iiuMmies. 

Table 4,1 show^s a frofiiK'ncy distribution- of the grades of the 1952 
graduating class of the Cnited States Merehanl Marine Academy. In 


lABLK t.l 

Frequency Distribution of Ormies !{i>ceivefl 
for (be Fonr-Yetir Course by 22.y Cadet ~ 
Midsttipinen of the 19.^2 (^md’mtinf: 
(dass of tin I niied >hi^es 
Merchant Marine Acaden^^ 
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Data from rnilotl Stutf^ Merchant Maiine 
Acailemy. 


order to show' the gem'sis of the frequency distribution curve, the data 
are first represented by a series of rectangles or bars in the ‘b'olumn 


2 Frequency distribution.^ are dLscussed in Chapter 8. 
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diagram of Chart 4.6. It will be noticed that the grades have been 
placed along the horizontal axis and the frequencies (number of cadet- 
midshipmen) along the vertical axis. There are as many columns in the 
chart as there were classes in the table, and the height of each column 
represents the frequency for the corresponding class. This column 
diagram is transformed into a curve by connecting the midpoint of the 
top of each rectangle with the midpoint of the top of each adjaeen 


NUMBER OF 
CADET -MIDSHIPMEN 



GRADE 


Chart 4.6. Gradefi Received for the Four-Year Course by 225 Cadet- 
Midshipmen of the 1952 Graduating Class of the United States Merchant 
Marine Academy, Shown hy a Column Diagram and by a Frequency 
Curve. Data uf Table l.l. 

rectangle, as shown by the broken lino in Chart 4.6. This is done upon* 
the assumption that iho values in a class interval are evenly distrib- 
uted throughout the class. I'ho mid-value of a class is consequently 
taken as representing the class. ^ It will be observed that the dotted line 
cuts off sojne .small triangular pieces of the original rectangles and that it 
also includes some small triangles not formerly included, but it is obvious 
that triangle A — triangle A', triangle B = triangle B', and so forth. 
Sometimes the curve is continued at each end to join the A^-axis (indi- 
cating a frequency of zero) at the mid-value of the next possible class. 


f This point is discussed at greater length in Chapter 9, 
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This procedure results in having the same area under the curve as is 
included in the rectangles. However, the result may sometimes be a 
curve which extends beyond zero on the X-axis, and this is apt to be 
meaningless. In any event the extensions suggest to the reader that 
items occurred beyond the limits of the observed data. Except for 
special purposes (see Chart 23.14), it is better not to extend the curve to 
the X-axis* The frequency distribution may be shown either as a column 
diagram or as a frequency curve (frequency polygon). The latter is 

NUMBER OF 

CADET *-MIDSHIPMEN 



Chart 4,7. Grades Received for the Four- Year Course by 225 Cadet- 
Midshipmen of the 1952 Graduating Class of the United States Merchant 
Marine Academy. Data of Table 4,1 


> more usual and the curve is plotted directly, as in Chart 4.7, without the 
intermediate step of constructing columns. 

Sometimes frequency distributions are encountered which refer to such 
information as number of children in a family, nuniber of automobiles 
parked in a block, or other data which can have only values that are 
integers (0, 1, 2, 3, etc.). Frequency distributions dealing with variables 
of this sort, which we shall identify in Chapter 8 as “discrete,” are 
generally shown by a column diagram, rather than by a curve. Chart 
23.12, showing the data of Table 23.7, illustrates this point; the sepa- 
ration of the bars serves to emphasize the lack of continuity which is 
present. 
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RULES FOR DRAWING CURVES 

While statistioians have not agreed upon a standard procedure setting 
forth in dcjtail exactly how line diagrams should be constructed, there arc 
certain rather ob^dous considerations of importance. The student who 
is interested in going into more detail in regard to the technique of chart 
construction is referred to a book dealing solely with that topic.'* 

Zero on vertical scale. The inclusion of a zero on the vortical scale 
of a curve is perhaps one of the most important rules. C’liart makers 

millions 
OF bales 



1925 1929, 1933 1937 1941 1945 1949 1953 

Chart 1.8. PrtMhictioii of t’otloii in the I’nilcul Stares, 1925 195,1. 

chart is incorrectly drawn, since the vortical scale begins witli 8 and tlien’ is no clear 
indication of the. omission of the zero. Data from source.*^ given below Chart 4 2. 

all too frequently neglect to observe this principle and the result is always 
misleading, since the visual impression is incorrect. In C'hart 4.2 the 
production of cotton in the ITnited States from 1924 to 1952 was plotted 
with reference to a vertical scale beginning with zero. The same series , 
of data appear in Chart 4.8, but on this chart the vertical scale begins at 
8,000,000 bales. Chart 4,8 gives the reader a visual impression which is 
quite contrary to the facts. For example, production in 1949 appears to 
have been about 10 times that for 1940, whereas Chart 4.2 shows clcarl}^ 
that 1949 production was only about twice as large as 194(1 production. 
Very few readers notice the omission of zero on a vertical scale, and fewer 
still are apt to make due allowance for the omission hi interpreting a 

* For example, >Mary E. Spear, i'harting Stalidirft, MeGriiw-Hdl Book Co., Inc., 
New York, 1952. Also W, C. Brinton, Graphic I'rcseniafiotit Brinton Associates, 
New York, 1939. 
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curve. It should not be necessary for a reader to refer to a scale in order 
to make approximate comparisons; the chart should be so drawn that 
visual comparisons may be made as quickly as possible. 

Showing the zero as in ("hart 4.2 Avould sometimes result in placing the 
curve high up on tin; grid and might also make the movements of the curve 
difficult to discern. Theiefore, the omission of the zero on the vestical 
scale of a chart usually occurs because the person constructing the chart 

MILLIONS 
OF bales 



Chart 4.9- iVoflu«*lioii of Cotton in the United St^f 1925 195.3. Data from 
sources given below Chart 4.2 

wishes to emphasize the movements of the cm /e and fo'ols that the space 
between the curve and the A">axis is U'^eless. There are se\'eral ways in 
which it is possible to show the zero (or to indicate clearly its omission), 
and also to avoid placing the curve high up ou the chart, ("hart 4.9 
shows a method in wliich a definite break is made ‘^.cross the chart. Some- 
* times the parallel lines are serrated (notched) instead of wavy. They 
may be drawn f’-eehand or, as in Chart 4.'h by making use of a bread 
knife as a ruler. Charts 4.10, 4.1 1, and 4.19 show oth^^r devices which are 
occasionally used. Notice that C’ ^rts 4.9 and 4.19 ►•^uow the zero and a 
scale break, while Charts 4.10 and 4.11 do not show the zero but merely 
call attention to the fact that the vertical scale is incomplete. 

Chart 4.12 appeared in the annual report of a large corporation. 
Because no warning is given of the omission of the zero on the vertical 
scale, this chart gives a misleading visual impression of the decrease in 
bonds and note i outstanding. Unless the vertical scale is consulted, the 
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Chart 4.10, Production of Cotton in the United States, 1925-1953 a Data 

from sources given below Chart 4.2. 

MILLIONS 
or BALES 

20 



Chart 4.H. Production of Cotton in the United States, 1925"195S« Data 
from sources given below Chart 4.2. 

reader may conclude that outstanding bonds and notes have been nearly 
eliminated. 

Occasionally curves will be seen which lack a zero on the vertical scale 
and which show the growth of 8ale.s of a commodity, membership in an 
organization, circulation of a periodical, or other data. The omission 




Churl 4.13. Consumers Price Index of Food in the Uniled States, 1935^1953i, 

1947-1949 « 100. Data from Monthfi^ Labor Review, February 1954, p. 236. 

of the zero makes the growth appear to be much more rapid than it really 
has been. 

Chart 4.13 shows iiide.'c numbers of the retail prices of food. This 
chart is unusual in two respects. In the first place, it carries a zero for 
the vertical scab which, though not wrong, is not necessary when price 
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index numberw are being plotted, because it is hardly conceivable that 
prices will ever approach zero and because 100 is the base of the index 
number. The 100 line should always be emphasized when it is the base, 
as in this chart. Similarl}- the zero line should be emphasized, as in 
Chart 1.9, when it is the base of the chart. When charting index numbers, 
some persons prefer to show the fluctuations above and below 100 in 
terms of positive and negative values. In the (^asc of ('hart 4.13, 100 
would become zero, 120 would become +20. and 75 would become --*25. 
The vertical vseale of Chart 4.13 would be altered to read +20, 0, —20, 
— 40, — ()0, —80 and —100. The curve itself would remain unchanged. 
The second unusual feature of Chart 4.13 is tlie treatment of the hori- 
zontal and vertical guide lines, which results in giving the curve nn 
unusually clear profile. Notice also that space has been left, to add later 
data. This practice allows the same original chart to Ix^ reproduced 
time after lime by merely extending the curve as new data become 
available. 

Ruling curves. The curve or curves representing the data should 
stand out clearly from the ba(*kgrotmd of the (‘hart. The curve should 
therefore lx* ruled more heavily than the coordinates. (When two or 
more curves are shown which follow oa(‘h otlun* closf'ly or wdiich intiM- 
twine, it is sometinu's lUH-essary to use more lightly ruled lines for .some of 
the onrv’es. See, for example, Chart 17.3. j As will be seen fr()!n the 
variou.s curves in this text. th(; plotted points are luU. u.siially shown, sin(‘e 
the attempt is to presetjt tln^ general situation rather than the iiidividuai 
reading.s. 

When .several curv(^s are drawn on the same axis, it is import ant for th(.‘ 
reader to be able to idenUfy ea(4i curve*. Idms wc* may us(.* solid, dotted, 
and dashed lines, and we may use heavy and light hues. If a light, line is 
used for a curve*, it sluxild ordinarily not be so light as the c(jordinates. 
The .suggested rnling.s are li>ted below as .V and Ib 


ABC 



A. !iiic*4 H, if more than 1^. These line« 

are recoimiicmltxl if three ciirvob arc to are tiot rccom- 

iiot more than three he lirawn, these nienflerl nnleJis plot- 

cxirves are to he liKhter lines may be led points are to he 

drawn. used. indicated by means 

of I he eirelcs or 
dots. 
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When two or more curves appear on a chart, each should be clearly 
identified. This may be accomplished by labeling the curves as in Charts 
4.16. 4.21, and 17.3. 

It is ordinarily well to avoid the use of more than two or three curves 
on one chart. Particularly if they cross and re-cross, confusion is likely 
to result. When several curves appear on a large wall chart which is to be 
presented to a group, different colors may occasionally be used, though it 
is usually better practice to reserve the use of color for those occasions 
when special emphasis is to be placed on one or two curves. Black, red, 
green, light or medium blue, and medium or dark orange are readily 
distinguished. If there is a likelihood that the wall chart is to be photo- 
stated, photographed, or reproduced for printing, black and red may be 
used in solid and broken, light and heavy, combrmations, since the red line 
will reproduce as black. Blue, yellow, and some shades of green photo- 
graph either not at all or faintly. Color is ordinarily too expensive to be 
used ill a b-jok. 

Coordinates. Chart makers emphasize the zero line by making it a 
little heavier than the other marginal lines. In similar fashion, a 100 per 
cent line (or other base with which comparisons are made) may be 
stressed. The marginal vertical and horizontal lines may be made slightly 
heavier than the other coordinate lines. 

The coordinate lines should be drawn very lightly. No more coordi- 
nate lines should appear than are necessary to assist in reading the chart. 
Occasionally all coordinates are omitted, as in Chart 4.4, which uses 
“tics” in lieu of coordinate lines. If it is desire^* to have a closely ruled 
grid in order to make plotting easy, the chart m\^ be drawn on tracing 
cloth or tracing paper which has been placed over a grid which has the 
desired closely spaced coordinate lines. Alternatively, when a chart is to 
be reproduced, a closely ruled grid of light blue may be used. The lines 
which should appear in the reproduction are ruled in black. The blue 
lines of the background do not show up in the reproduction under ordinary 
conditions. Some of the charts in this text were drawn on such a light 
+)lue background. 

In order to insure a proper understanding of a chart, the two scales 
should be clearly labeled. Not only should the nature of the data be 
indicated, but the units used sho’ M also be stated. For example, in 
Chart 4.3 the horizontal axis shows incomes, the unit being thousands of 
dollars. Occasionally a curve of a long time series may be rather extended 
horizontally. In such instances it is sometimes desirable to repeat the 
vertical scale at the right of the chart. 

Chart proportions. It is hardly possible to give an objective rule as 
to the proper proportions for a curve diagram. It should be noted, how- 
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Cotton in the United Slates, 1925- 
1953. The vertical dimension is ex- 
in relation to the horizontal 
dimension. Data from sources given 
below Chart 4.2. 


ever, that bizarre impressions result 
from over-expanding or over-contract- 
ing either scale used for a curve. In 
Chart 4.14 the vertical scale is exag- 
gerated in relation to the horizontal 
scale; in Chart 4.16 the horizontal scale 
is exaggerated. The former gives an 
impression of tremendous fluctuations; 
the latter conveys the idea that cotton 
production has undergone relatively 
unimportant fluctuations. These two 
charts indicate distorted results of rc- 
plotting the data shown properly in 
Chart 4.2. Rules of thumb are often 
unsatisfactory because they are apt to 
be adopted bIindl 3 ^ However, it has 
been suggested that the proper pro- 
portions are those w'hich result in a 
45-degree angle for the movements of 
the curve which are to be emphasized. 

Just as it is possible to overempha- 
size or to minimize fluctuations by poor 
choice of scales, so it is possible to create 
misleading impressions in regard to 
growth. One curve of Chart 5.3 shows 
automobile registrations in the United 
States for 1917-1953. Expanding the 
vertical scale and contracting the hori- 
zontal scale would give a visual impres- 
sion of very rajiid growth of United 
States automobile registrations; con- 
tracting the vertical scale and expand- 
ing the horizontal scale would make 
the growth appear to have been very 
slow. 

Although the tw^o preceding para- 
graphs referred to curves of tinne series, 
it should be understood that misleading 
visual impressions may be given by 
curves of frequency distributions, and 
hy virtually any other type of chart, 
if one scale is over-expanded or is 
unduly contracted in relation to the 
other scale. 




from sources given below Chart 4.2. 
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Lettering. All lettering on a chart, including scale labels, scale 
values, legend, curve labels, and any other words or figures, should be 
placed horizontally, if possible. Occasionally space limitations may 
necessitate placing the vertical scale label in a vertical position (which 
may be the reason it was so placed in Chart 6*3), but such a limitation is 



Chart. 4.16. Arrivals and Departures of United Staten Citi/eiin, January 
1947 Deeemlier 1952. l^or .source of data, see Chart 4.4. The hatched areas repre- 
sent excess of arrivals over departures; the stippled areas show excess of departures 
over arrivals. 

not often present. Needless to say, all lettering should bo legible. 
Freehand words and figures may be made very attractive when executed 
by a skilled pt^rson. The amateur may, however, make excellent formal 
letters and figures with a little practice by the use of stencil lettering 
devices available fro: a artists' or draftsmen's supply houses. Nearly all 
of the charts in this text, exc'opt those reproduced from other publications, 
were lettered by means of such devices. The lettering inside of the border 
of Chart 17.1, and some of the other inserts on charts elsewhere in this 
book, was done by use of a typewriter having block type. 

Title, Each chart, like each table, should have a title, which should 
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state clearly and succinctly what the chart purports to show. The title 
of a printed chart may appear either above or below the chart, but 
preferably below. The titles of large wall charts are often placed above 
the grid or, sometimes, upon it. 

Source. Again, as in the case of a table, each chart should contain a 
source reference to indicate the author, title, volume, page, publisher, and 
date of the publication from which the data were taken. Naturally the 
cautions regarding comparabilHy of data taken from the same source or 
different sources, mentioned in Chapter 2, apply with full force to the 
figures used for making charts. 

LINE DIAGRAMS FOR SPECIAL PURPOSES 

Net balance charts. Chart 4. 1 shows one method of indicating the 
net total of two series. For each month, dci artures w^ere subtracted 
from arrivals and the result plotted as a positive or negative figure. The 
halancf" of trade (value of exports minus value of imports) may be shown 
in the same manner, as may also profit and loss. An alternative method 
of showing the arrival and departure data is illustrated in Chart 4.16. 
Here the curves for arrivals and for departures are given; excess of 
arrivals is indicated by the height of the (TOss-hatched area, while the 
excess of departures is shown by the height of the stippled portion. 

Silhouct te charts. Chart 4. 1 0 (referred to in the preceding paragraph) 
illustrates not only the showing of net amounts rather than gross amounts, 
but likewise the practice of shading the area between two curves in order 
to obtain emphasis. Chart 4.17 is similar to Chart 4.4 in that it shows 
fluctuations above and below a base line. In ( hart 4.17, however, tlie 
areas of the c\irve have been emphasized by filli .j; in with black. The 
result is a more striking portrayal of the “plus^^ and “minus’’ parts of the 
curve. A chart of this type is even more effective wdien the “plus” areas 
are filled in with black and the “minus” areas are filled in with red. 

Maximum variation charts. The Library of Columbia University 
(^isplayed in an illuminated glass case a number of valuable old prints. 
For the proper preservation of the prints it was desired to maintain the 
temperature ■l)etwoen 70 and 80 degrees Fahrenheit. The problem con- 
sisted of adjusting radiation of heat from the «*ase, ventilation and con- 
duction, and the proximity to nearby radiators so th«t the temperature 
inside the case would remain wii 'n the desired limits. A recording 
thermometer was placed in the case and the temperature was continuously 
recorded over an extended period. In Chart 4.18 a four-day section of 
one of the charts is shown. ' During these days there was no heat in the 
adjoining radiator, and it may be seen that the temperature never fell 
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Chart 4.17. A Portion of The Cleveland Trust Company's Chart of American Business Activity Since 1790. From the twenty* 
sixth edition of that charts issued ^ptember 1954 by The Cleveland Trust Company. 
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below 70 degrees but did slightly exceed 80 degrees on several occasions 
On Thursday, Friday, and Saturday the library was open to the public 
from 8 a.m. to 10 p.m. ; on Sunday, from 2 to 6 p.m. ’ The dashed lines 
have been added by the authors and serve to stress the limits beyond 
which the temperature should not fluctuate. 



Chart 4.18. Temperature Fluctuationa ia a Library Display Case. 

Teiuperature is in degrees Fahrenheit. The curved ordinates are made to 
correspond to the arc described by the recording pen of the thermometer. 
<From the Library of Columbia University.) 


Range charts. Chart 4.19 shows a device by means of which the 
range of stock prices may be depicted. It will be noticed that the black 
band expands when the range is greater and contracts when the range is 
smaller. The white line indicates the closing price. An alternative 
method of showing the same data is illustrated ip Chart 4.20. Here the 
top of each bar represents the high for the day, while the bottom of each 
bar represents the low for the day. The line connecting the bars repre- 
sents the closing price. Charts such as these may be used for showing 
commodity prices and other sorts of data if it is desired to show a range of 
^v^riaiion over a period of time. 

4.^oharts. The Z-chart consists of three cur^^es on the same axes as 
^s^own in Chart 4.21. Usually the chart covers a period of one year, by 
months. One curve shows the monthly figures, another shows the 
cumulative figures from the beginning of the year, while the third shows 
the total for the twelve months end ng with each month. This last curve 
is generally called the moving annual total curve; more specifically, it is a 
12 month moving total for the twelve months ending with each designated 
month. Two vertical scales are used with the Z-chart, since, if the 
monthly data were plotted against the same scale as the other data, thc» 
fluctuations of the monthly data would not be apparent. The Z-chart 




Chart 4.20. High, Ixjw, and (doling Pric<‘8 of 50 Stocks as Shown hy the 
New York Times Averages, January 5 February 27, 1953. Data from various 
iasnes of the New York Ttnies. 
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is often used for internal business purposes, showing, for example, data 
of production and sales. It is, of course, limited to those situations in 
which the chart maker is interested in visualizing: (1) the figure for a 
given month, (2) the figure for each month for that part of the calendar 
(or fiscal) year which has elapsed, and (3) the figure for the twelve months 
ending with each given month. 

Except f,or special purposes such as this, it is not usually desirable to 
use two, or more, vertical scales (sometimes referred to as multiple 

MOVING TOTAL 



("hart 4.21. Sales of Sears lloehiiek and (Company; Monthly, (Cumulative, 
and Moviiift Annual Total, 195:1. Data from Survey of Current Brisiness, February 
1953, p. S-in, and February 1954, p. S-10, 

scales^') on a chart of the type described in th^s chapter. The occur- 
rences of fluctuations (but not their magnitudes) in two series expressed 
in different units may occasionally be compared on a chart having two 
different vertical scales. However, the use of two. or more, different 
vertical scales is likely to give falsr ’'/isual impressioiib of the comparative 
magnitudes of changes occurring in the various series. 

Varying horizontal-scale charts. Occasionally it is desired to show 
annual data over a number of years, and monthly data for one or two 
more recent years. This may be done as in Chart 4.22, in which the hori- 
zontal scale is expanded to show the monthly data in more detail. N otice 
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that the two parts of the chart are separated by a break. Similarly, a 
change in horizontal scale may be in order if we wish to show a combina- 
tion of annual or monthly data with weekly data, or a combination of 
annual, monthly, or weekly data with daily data. 



Chart 4.22. Index of Ordinary Life Insurance Sales in 
New Jersey, Annually, 1937-1946, and Monthly, 1947-1953. 

R^^produced from Review of New Jersey Business, January 1954, 
p. 17. 

Multiple*axis charts. Occasionally it is desirable to compare the 
fluctuations of several curves and yet to have each curve stand out 
clearly. A simple method of accomplishing this result is to plot the differ- 
ent curves along different horizontal axes, these different X-axes being 
arbitrarily separated by convenient vertical distances. An illustration 
is Chart 14.5, which is also referred to as a year-over-year chart.^' 
Here the different curves have been brought close together for ease of 
comparison, but there is no crossing of the lines. Although different 
horizontal axes are employed, the vertical and horizontal scales reniairv 
the same. In interpreting such a chart on arithmetic graph paper (as 
distinguished from semi-logarithmic graph paper described in the follow- 
ing chapter), it should be remembered that the comparison afforded is 
that of absolute and not of relative changes. It is unlikely that the use 
of this type of chart will be found desirable for presentation to the general 
reader, unless the diagram is accompanied by a clear explanation. 

Component-part charts. Chart 4.23 shows the number of persons 
' in the United States at each census from 1850 to 1950, in each of four 
general age groups. The height of each band indicates the number of 
each age in the country at a given census. It is possible to observe, from 
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1650 1870 1690 1910 1930 1950 


Chart 4.23. Population of the United States in Each Specified Age Group, 
1850-1950. Data from U. S. Bureau of the Census, FifUenth Census of the United 
StaHes, 1930, Population Volume II, p. 576, and I/. S. Census of Population, 1950, 
Vol. II, Characteristics of the Population, Part I, United States Summary, p. 1-93. 



Chart 4.24. Proportion of the Population of the United States in Each 
Specified Age Group, 1850-1950. Data from sources given below Chart 4.23. 
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this type of chart, whether or not a given group is increasing or decreasing, 
and whether or not the total of all groups is increasing dr decreasing. 
The relative importance of a particular group cannot be visualized from 
Chart 4.23, but in Chart 4.24 the age groups are shown according to the 
proportions which they constitute of the total population. Here it may 
be clearly seen that there has been a decrease in the proportion of younger 
persons and an increase in the proportion of older persons in tjie popula- 
tion. When component-part data covering a few years are to be shown 
graphically, a bar chart such as the upper part of Chart 6.17 or 6. 18 may 
be used. W'hen a number of years are to be shown, the general trend can 
be more easily pictured by curves. 

Frequency distribution and range chart. Sometimes it is advan- 
tageous to show a frequency distribution curve for one set of data and to 
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Chari 4.25- Weekly Karnings of 14,817 Female Sf^o.retaries in Non-Munii- 
faeturing Industries in New York City and llange of Pay for Female Secre- 
taries in a Non-Commercial Organization, January 1952. The data of weekly 
earnings in New York City are from Table 8.5 and are “frequency densities,’’ whieii i 
are explained in the discMission eoncerning Chart 8.5. 


compare with that curve the range of values for another distribution. 
Chart 4.25 shows a frec^aency distribution of the average .straight-time 
weekly earnings of 14,817 female secretaries in nori-manufacturing 
industries in New York City in January 1952. A non-commercial organ- 
ization was interested in knowing how its secretarial salaries compared 
with these and showed the range of its own salaries as indicated on the 
chart. Alternatively, twoTrequency distributions could have been shown, 
as in Chart 8.7. 



CHAPTER 5 


• Graphic Presentation II: 

THE SEMI-LOGARITHMIC OR RATIO CHART 


A:\IOljINT OF CHANGE VS. RATIO OF CHANGE 

Whpp ponsideriiig the development of a series of statistical data over a 
period of time, vve are sometimes interested in the amount of change that 
has taken place, but more often ^ve wish to know something about the 
ratio of change that has occurred between two dates. Diagrams such as 
Charts 4.2, 4,4, and various others in C'hapter 4 are of the familiar type, 
havi/ig what are termed arithmetic scales^ and are of use, primarily, for 
indicating absolute changes in the factor shown on the F-axis. It is the 
purpose of this discus.sion to explain a slightly different sort of grid which 
enables one to visualize the ratio of change in a plotted series. 


TABLE 5.1 

4n Arithmetic Progression 


Y ear i 

(.V vaUu^) 

1 

Y value 

Amount 
of ir.,T(*a8e 

1946 ' 

0 “ 


4947 ■ 

200 

200 

1948 

400 1 

200 

1049 

600 

200 

1950 

800 

20(' 

1951 

1,000 

. 200 

1952 

1,200 

1 200 

1953 

1 .400 

1 200 


'rhe ability of the usual type . '* chart to give a satisfactory visual 
impression of absolute change, but not of ratio of change, is brought out 
by Chart 5.1. Curve A represents a constant amount of increase of 200 
units per year (see Table 5.1), and this, or any other, arithmetic progression 
(constant amount of increase or decrease) wilfbe depicted by a straight 
line when plotted on the conventional or arithmetic grid. Curve B, 
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Y VALUES 

2500 


2000 


1500 


1000 


500 


1946 1947 1948 1949 1950 1951 1952 1953 

Chart 5*1. An Arithmetic Progression (A) and a Geometric Progression (B) 
Plotted on an Arithmetic Grid. Data of Tables 5.1 and 5.2. 

however, is the result of plotting a series of figures whirh begin with 128 
and increase 50 per cent each year (see Table 5.2). It will be noticed 
that this curve is not a straight line; the curve bends upward more and 
more sharply as time passes. 

TABLE 5.2 


A Cat me trie Progression . 


Year 

(X value) 

Y value 

Per rent 
of increase 

1916 

128 


1947 

192 

50 

1948 

288 

50 

1940 

432 

50 

1950 

1 648 

50 

1951 

' 072 

50 

1952 

1,458 

50 

19.53 1 

1 2,187 

50 


A series showing a constant ratio of increase or decrease is known as a 
geometric progression^ and any geometric progression will yield a curved 
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line when plotted on an arithmetic grid.* An increasing geometric pro- 
gression is represented by a curve which slopes upward and is concave 
upward, as in Curve B of Chart 5.1 ; a decreasing geometric progression is 
represented by a curve which slopes downward and is concave upward. 
A serious difficulty in interpreting such curves, however, lies in the fact 
that the eye cannot discern whether or not a particular curved line does 
or does not represent a constant ratio of change. Chart 5.2 depicts a 
series whiclp is neither an arithmetic nor a geometric progression. The 
data of Table 5.3 show that the series increases more rapidly than an 


TABLE 5.3 

A Series of Increasing Values 


Year 

(X value.) 

I' value 

Amount 
of increase 

Per cent 
increase 

i‘>4e 

50 



1947 

80 

30 

60.0 

1948 

160 ! 

80 

100.0 

1949 I 

300 

140 

87 5 

1050 1 

550 

250 

83 3 

1951 

1 .080 

530 

96 4 

1952 

i 1,730 

650 i 

i 60 2 

1953 ' 

1 2,500 

770 ! 

1 44 5 


arithmetic progression, and the eye can grasp this fact because the curve 
bends upward. The table also indicates that the ratio of increase of the 
series is not constant. Visually, however, this fact is not apparent. It 
is not possible for the reader of an arithmetic chart to be sure whether 
a curved line, such as this, represents a constant ’"?tio of increase, a ratio 
of increase which is diminishing, or a ratio of increi se which is accelerat- 
ing. Any series of figures that increases more rapidly than an arithmetic 
progression (for example, 10, 12, 15, 19, 24, 3G) slopes upward and is 
concave upward when plotted on an arithmetic grid; any series of figures 
that decreases less rapidly tlian an arithmetic progression (for example. 
100, 91, 83, 76, 70, 65) slopes downward and is concave upward when 
j^iiown on arithmetic coordinates. 

Before proceeding to develop the ba.sis for the semi-logarithmic or ratio 
grid, which will enable us to visualize ratios of ('hange, let us examine fur- 
ther the arithmetic grid. Chart 5.3 shows the growth of motor vehicle 
registrations in the United States and *ti Canada from 19l7 to 1953. We 

* A curve representing a geometric progression is termed an ^'exponential curve*' 
and is indicated by the equation Y The reader may be familiar with this 

equation in the form Pn ** P^(l -h r)**, which is the compound interest equation and 
is discussed in Chapter 9. A straight line representing aii arithmetic progression is 
indicated by Y — a + bX, 
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can see from this chart that registrations in the United States increased 
rapidly and, apparently, in approximately an arithmetic progression from 
1917 to 1929; held fairly constant from 1929 to 1930; dropped in 1931, 
1932, and 1933; and resumed the upward movement from 1934 to 1937, 
only to fall slightly in 1938. They rose from 1938 to 1941, fell from 1941 
to 1944, and increased from 1945 to 1953, showing approximately an 
arithmetic progression from 1945 to 1951. Changes in registration in 

Y VALUES 



Chart A Scricg of Fi^iireH IfK^reanin^ by iticrcaniiiK Atiioiinl8. TIitH 

serit's is not a geoinptric progression, but may give that visnai imprewKinn. Data of 
Table 5.)h 

Canada are difficult to see because the scale which must be used to accom- 
modate the United States causes the curve for Canada to fall rather close 
to the base line. However, it appears that registraiioris in C’anada 
increased from 1917 to H'30; decreased in 1931, 1932, and 1933; increased 
again to 1941; declined very slightly for 4 years; and increased thereafter. 
It is quite obvious that the amounts of increase and decrease each year 
were greater for the United States than for (Canada, but there is no way 
of knowing from the appearance of the curves which country had the 
greater ratios of increase or decrease from year to year. 

It would not do to replot the dat?. of Chart 5.3 by using one vertical 
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MILLIONS OF 
VCHICLFS 



Chart 5.3. Motor Vehicle Kegii^trations in the I'liiled Stales 
and Canada, 1917-1953. Data from Autoi.iobiie Mauufacturora 
Association, Automobile Facts andFigures, 1953, p. 21 , TheC .-^lada 1 ear 
Book, 1937, p. 668, 1948 49. p. 707, 'k'^ 4, p. 811, Tabic M\ i, 1953, of 
State ^foto^~Veh^cle Registrations — issued by the B\ir(‘au uf 
Public Roads; and by correspondence from the Dominion Bureau of 
Stalls lies. 
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scale for the United States and another for Canada, in order to magnify 
the movements of the curve for the latter. The fact that one curve is 
below another on an arithmetic grid tells us at a glance that the lower 
curve represents a series of smaller magnitude than does the upper. If 
two vertical scales are used, we have really two distinct, non-comparable 
charts, and no satisfactory visual comparisons may be made in respect to 
(1) the size of the two series plotted, (2) the amount of change which has 
taken place in one series in comparison with the amount of change in the 
other, or (3) the ratios of change of the two series. 

A GRID TO SHOW RATIOS OF CHANGE 

From what has already been said it must be obvious that graphic com- 
parisons in respect to ratios of change will be facilitated if we can employ 

LOGARITHM 



Chart 5.4. logarithms of a Geometric Progression Plotted on an 
Arithmetic Grid. Data of Table 5.4. 


a sort of grid which will make a constant ratio of increase (or decrease) 
appear as a straight line. In Table 5.4 the geometric progression of 
Table 5.2 and Chart 5.1 is again shown, and with it are given the loga- 
rithms of the various numbers. Examination of these logarithms reveals 
that they form an arithmetic progression; therefore, if these logarithms 
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TABLE 5.4 

A Geometric Progression and Logarithms oj tjie 
Geometric Progression 


Year 
(X value) 

y value 

Logarithm 

of 

Y value 

Amount of 
increase of 
logarithms 

1946 

128 

2 107210 


1947 

192 

2.283301 

176091 

1948 

288 

2.459392 

. 176091 

1949 

432 

2.635484 

. 176092* 

1950 

648 

2.811575 

. 176091 

1951 

972 

2 987666 

. 176091 

1952 

1,458 

3 163758 

. 170092* 

195.3 

2,187 

3,339849 

. 176091 


* These values differ slightly because the logarithms were 
rounded to the nearest millionth. 



1946 1947 1948 1949 1950 1951 1952 1953 


Chart 5.i>. A Geometric Progression Plotted on a Semi-Logarithmic or 
Ratio Grid, Data of Table 6,2. Printed semi-logarithmic forms have more inter- 
mediate rulings than shown in this chart. These closely spaced lines are an aid to 
plotting but are omitted from most of the charts in this book, since reduction to fit 
the size of the page would result in bringing these lines very close together. The 
detailed ruling is shov u in Chart. 5.18. 




100 


GRAPHIC PRESENTATION II 


[Cmap. 3 


are plotted on an arithmetic grid, a straight line will result, as may be 
seen in Chart 5.4. This is one of accomplishing our objective, but 
it involves the additional step of looking up logarithms before the data 
can be plotted. However, instead of plotting the logarithms of the values 
of a senes, we may iKse a grid which is designed \Nith a logarithmic 
verrieal scale, as in Chart 5.5. Here, again, we find that the geometric 
progression appears as a straight hue. A grid of 

* thi.s type is termed scrr\^4o(farithmic because one 

0 — yrale is logarithmic and the other is arithmetic. 

^ il' The logarithmic scale. The construction of 

ft — lit: — the logarithmic scale merely involves spacing the 

5 — vertical-scale values in proportion to the differences 

^ between their logarithin.s. Referring to Chart 5.6, 

it will be found that the distance from 2 to 3 on 
j f th#‘ scale is 0 352 inch, and from 3 to 4 is 0.250 inch. 

We then have; 


Cliart 5.6. Ihc 
1 It lirtiic* Ncule. 


0 602 -* 0.477 
and the proportion i.s: 


0.^2 inch 
0.250 incl) 

0 352 inch 
0.250 incli 


‘r;;*’ * 01700125 : 0.352 inch ;0.250 inch. 

Ka rvC’-- alternative api)roach to an understanding of 

rii- dinurr^n * tlie logantlimic scale does not involve logarithms. 

Hef(‘ronce to (diart 5 I will recall that equal dis*- 
' tam es on the vertical scale of an arithmetic grid 

repie:“cnt equal amoin>ts. Kqual distances measured along a logarithmic 
scale ho\\evcr, represent etpial ratios. On the vertical scale of C'hart 5 5 
n may be -ecu that the distance from 100 to 200 is 0.48 inch; likewise the 
distance from 300 to »')00 0.48 inch. Measurement will reveal that any 

t^n number^ of rati<i 1 :2 are separated by 0.48 inch on this scale. On 
this sum*? ‘'CuU? tlie ilistarn'e from 200 to 800 is 0.96 inch, and it follows 
that uuy two timbers of ratio I ;4 will be .separated by 0 90 in<4i. Thus 
'' e sccj hy the <crrii logarithmic cliart is frequently termed the ratio chart. 

d'lie vertical scale of C'hart 5.5 is divided into twm parts which are 
generally cuUcd cycles, W'e therefore refer to the paper on which Chart 
5 5 was drawn as “two-cycle .semi-lognriUimic paper.“ In labeling the 
verrieal .scale of a serni-logaiithmic chart, wc may begin with any positive 
vaiiH? 'rh(‘ figure at the top of tne first cycle will be ten times that at 
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lOOOO 

90CO 

6,000 

4 000 

2.000 

I 000 
600 
600 

400 


200 

100 

60 

60 

40 

20 


10 


S7 000 
IJ.600 
10 200 

6 600 

3400 

1,700 

«380 

«020 

680 




50 000 
40.000 
30 000 

20 000 

1 0 COu 

5 GOO 
4 ''OO 





Chari 5.7. Logarithmic Vertical Scales, Tbr* scale bei^uinintj; with 17 
would be difficult to use. 
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the bottom of the cycle; the figure at the top of the second cycle will be 
ten times the figure at the bottom of the second cycle (the top of the first 
cycle); and so on.^ In Chart 5.7 there are illustrated eight different 
logarithmic scales beginning with 0.1, 1, 2, 5, 10, 17, 25, and 50, respec- 
tively. Although it is mathematically permissible to begin a logarithmic 
scale with any positive value, it is advisable to select a scale which will 
allow interpolations of intermediate values to be made readily. The 
scale beginning with 17 would be very difficult to use. If it were desired 
to have a three-cyde scale beginning with 0.5, the various values of the 
first scale could be multiplied by 5. Most ready-ruled semi-logarithmic 
paper carries along the right edge of the grid such designations as those 
shown in Chart 5.18. These are multiplying factors and indicate that 
the value to be written opposite each horizontal line on the left scale 
must be the value at the bottom of that cycle multiplied by the figure 
shown opposite that horizontal line on the scale at the right. 

If a logarithmic scale were begun with zero, the top of the first cycle 
would be 10 X 0 — 0, and all values on the scale would also be zero. 
Suppose that the uppermost value of a three-cycle logarithmic scale is 
0.01, Then the bottom of the third cycle is iV of 0.01, or 0.001; the bot- 
tom of the second cycle is O.OOOl; and the bottom of the first cycle is 
0,00001. There can thus be no zero base line, and the semi-logarithmic 
chart does not permit interpretation of curves in terms of distances above 
a base line as does the arithmetic chart. Although plotted values may, 
of course, be read against the vertical logarithmic scale, no visual impres- 
sion may be had of the absolute magnitudes plotted. The semi-logarith- 
mic chart shows: (1) a constant ratio of change as a straight line; (2) the 
ratio of increase or decrease by the slope of the line; and (3) the compari- 
son of ratios between two or more lines by means of parallelism of these 
lines or lack of it. 

Whenever a logarithmic scale is employed, enough rulings, or rulings 
and tics, should be shown so that the reader will be aware that he is not 
seeing a chart drawn on an arithmetic grid. Since there are other 
unequally spaced scale.s in addition to the logarithmic scale (for example, 
the reciprocal scale), it is sometimes also desirable to state: ratio 
chart,” ‘^serni-logarithmic chart,” or ‘Mogarithmic vertical scale.” 

Note that a logarithmic scale may cover an integral number of cycles, 
as in Chart 5.5, which has two cycles. On the other hand we may use 
part of one cycle, a.s in Chart 5.14, or we may employ one or more cycles 
and part of another cycle, as in Chart 5.9. 

* A common loj^arithm ia the power to which 10 must he raised to produce a given 
number. Thus, 100 is 10*, and the logarithm of 100 is 2.0; 10,000 is 10^ and the 
logarithm of lO.OfX) is 4.0, 
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Chart 5.8A. Curves on Arithmetic and Semi-L«>garithmic 
Grids. The two curves in each of the lower ght squares axe 
equidistant vertically from each other. 


Arithmetic Vertical Scales 
Af A^— Constant amounts of increase, samo for ooth curves, 

B, B' — Different constant amounts of increase. Rreater for B. 

C, C' — Different constant amounts of increase, greater for C\ 

D, D ' — Constant amounts of decrease, same for both curves. 

E, B ' — Different constant amounts of decrease, greater for E. 

F, F' — Different constant amounts of decrease, greater for F\ 

G, (?' — Amounts of increase increasing, same for both curves. 
H, W — Amounts of increase decreasing, same for both curves. 

/, V — Amounts of decrease increasing, same for D^Lh ciirvos. 
J, J' — Amounts .>f decrease decreasing, same for both curves. 


Logarithmic Vertical Scales 

а, a * — Constant relative increases, same for both curves. 

б, b ' — Different constant relative increases, greater for 

c, c' — Different constant relat've increases, greater for e'. 

d, d ' — Constant relative decrc -’S, same for both curves. 

e, e ' — Different constant relative decreases, greater for «. 

/, /' — Different constant relative decreases, greater for f, 

o ' — Kelative increases, increasing, same for both curves. 
hj a' — Relative increases, decreasing, same for both curves. 
%j %* — Relative decreases, increasing, same for both curves. 
ft 3 * — Relative decreases, decreasing, same for both curves. 




ARfTHMETlC 
vcrtjcal scales 






LOGARITHMIC 

vertical scales 



An arithxnetio progreewion. 


A scries in which the absolute ohange is increas- 
ing: 

a If relative rhangc is increasing, 
b. If relative change ts constant. 

0. If relative change ts decreasing. 



A series in which thie absolute change is decreas' 

mg. 


Two arithmetic progressions, same absolute 
changes. 








A .seriiys in which the relative change is inrrca.s- 
mg 



A .'»er!es in which the relative change is decroii.s' 
mg. 

A. If absolute change i-s increasing. 

R If absolute change constant. 

C. If absolute t hange is decreiusing 



Two geometric prugressioiife, same relative 

changes. 


< i^miparifion** of S€?ri^fs of \ arious 'I'ypcs Plotted in Relation to 
\ aii<! Logarithmic Verliral Scales. Series plotted as shown on one 
' ' iic Vipeoe.if- inciiealf'd on the other. The above coinparisonrf refer to increasing 
,-i f uvi <4ily. It iti Miggeated that the reader sketch some coiuparisons involving declin- 
ing series. 
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Interpretation of curves. Before proceeding with a consideration. of 
applications of the semi-logarithmic chart, attention should be given to 
ChartvS 5,8A and 5.8B and the comments below them. When two straight 
lines are parallel on semi-logarithmic paper (for example, a, a'; d, d')j we 
know that they have constant ratios of change and also that the ratio 
between the two has remained constant. Parallelism between curved 
lines is very difficult to judge with the eye. Reference to the low^er sec- 
tions of Chart 5.8A will show that the curved lines are always the same 
vertical distance apart, and thus the two curves in each section are parallel 
with respect to the X-axis. 

APPLICATIONS 

Comparing ratios of increase or decrease. Since there is no zero 
on the vertical scale of the semi-logarithmic chart, and thus no base line, 

UNITED STATES CANADA 

MILLl'^MS Of VEHICLES THOUSANDS Of VEHICLES 



('.hart 5.9. Motor Vehicle Rcj^istra lions in the I'nitcd Stales and Canada, 
1917 1953, Data from sources given below Chart 5.3 

and since equal vertical distances (on the same scale) always represent 
the same ratio, it is permissible to use two or more difi jrent vertical scales 
in order to bring curves of dilTerfc..t magnitude close together for com- 
parison. This has been done in Chart 5.9, which presents the data of 
motor vehicle registrations previously shown on an arithmetic grid in 
Chart 5.3. Shifting the vertical scale of a scmi-logarithmic chart moves 
the curve upward or downward, but the slope, which is of paramount 
importance, is i.ot altered thereby. When using two logarithmic scales. 
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as in Chart 5.9, it is desirable (though not absolutely necessary) to keep 
the series of sm aller magnitude below that of greater magnitude; likewise, 
if one or more components are being compared with a total, the curves 
for the components should be below that for the total. 

Chart 5.3 gave us no idea of the relative growth of automobile registra- 
tions in either the United States or Canada. Chart 5.9, however, shows 
relative growth for each series and enables us to compare the ratios of 
growth of these two series of dissimilar size. In general, both series have 

KR cmr 

OF CHANCE 



•17 -21 '25 ’29 '33 ’37 '41 '45 '49 '53 


Chart S.IO. Annual Per Cent of Increase or Decrease in Motor Vehicle 
Registrations in the United States and Canada, 1918-1953. Data from 
sources given below Chart 5.3. 

shown about the same ratios of increase and decrease throughout the 
period. However, the ratio of increase from 1947 to 1953 is seen to be 
greater for Canada. The insert on Chart 5.9 makes it possible to estimate 
the ratio of increase or decrease from any one year to the next for the 
curves shown. It does not, however, apply to other charts which have 
different scales. 

An alternative method of showing the relative change in motor vehicle 
registrations in the United States and Canada consists of calculating the 
per cent of change for each year and plotting the results on an arithmetic 
grid. This has been done in Chart 5.10. 
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Instead of comparing the percentages of change of two different series 
over the same period of time, we may be interested in comparing ratios 
of growth of the same series at different times. Thus in Chart 5.9 we can 
see that the per cent of increase of United States automobile registrations 
was greater from 1950 to 1951 than from 1951 to 1952, and also that the 
relative decline was greater from 1942 to 1943 than from 1943 to 1944. 
Similar coaclusions may be drawn from Chart 5.10. 

It is frequently necessary to compare series which are expressed in 
different units. For example, we may compare any two or more of the 
following: commercial failures, in millions of dollars; volume of trading on 


CLCCTRIC POWER CCMCNT 

MILLIONS OF THOUSANDS 

KILOWATT HOURS OF BARRELS 



Chart 5.11. Average Monthly Production of Electric Power and of Port- 
land Cement, 1935-1953. Data from U. S. Department of Commerce, Office of 
Business Economics, Business Statistics^ 1953, pp. 131 and 183, and from Survey of 
Current Business ^ Fcbniary 1954, pp. S-20 md S-38. 


a stock exchange, in number of shares traded; coal production, in 2,000- 
pound tons; petroleum production, in 42-gallon barrels; lumber produc- 
tion, in board feet; cement production, in 376-pound barrels; electric 
power produced, in kilowatt hours; manufactured gas, in cubic feet. It 
is possible to reduce 376-pound barrels to tons, but it is not possible to 
•change kilowatt hours to board feet, or vice versa. 

While one could plot two series expressed in different units on an 
arithmetic grid, it is not often that such a comparison is useful. Except 
to ascertain whether the two series fluctuate concurrently, we are not 
likely to be interested in comparing ^he changes in electric power pro- 
duction in kilowatt hours with the ctianges in cement production in bar- 
rels. Rather are we apt to want to compare the percentage change in 
electric power production with the percentage change in cement produc- 
tion. On the semi-logarithmic grid, there is no zero base line; only the 
slope of a curve has meaning, and we are enabled to make a valid com- 
parison of the lidative changes in the fwo series expressed in such dis- 
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similar units as those just mentioned. Chart 5.1 1 shows a comparison of 
the production of electric energy and of portland cement. Among other 
interesting comparisons may be noted the more rapid ratio of growth in 
the production of electric power from 1949 to 1953 and the relatively 
more severe decline in production of cement from 1937 to 1938, the only 
year during which both series dropped. 

Comparing fluctuations. Comparison of the fluctuations taking 
place in two chronological series of different size may he illustrated by 


MILLIONS OF 
SHORT TONS 



^35 *38 41 '44 * 47 * 50 ’53 


t^hart 5,12. IVoUuciian of BituniitKoiM <!oul and f'.oke, 1935 1953. Data 

from V, S ilrpart riu'iit of ( 'oiniuonM*, OfTiCO of Bu.sinrs.s Kconomirs, Ihisiness Sta- 
tistics, 19.53. pp U}«S and 170, and Srfrrc': of ('urrent Marrh 19.5-t, pp. 8-34 

and S-35. Figiirea for include byproduct (oven), beehive, and petroleum coke. 

reference to Chart.s 5 12 and 5.13, wdiitdi .show the production of bitumi- 
nous coal and of coke for 1935 1953. Both series are expressed in terms 
of short tons, but production of bituminous coal greatly exceeds the 
production of coke. 'The result is that when the two series are showm 
on an arithmetic grid, iis in ('hart 5.12, the fluctuations of the larger 
series may be clearly seeri but those of the smaller series are not apparent. 
When the two sets of data are depicted on a semi-logarithmic grid (Chart 
5.13), not only can the fluctuations of both scries be seen, hut their rela- 
tive .severity may be compared For example, it is clear from Chart 5. 13 
that the ratio of increase in the production of coke from 1938 to 1940 was 
greater than the ratio of in(‘roase in the production of bituminous coal for 
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these same years, and also that the relative decrease from 1948”-! 949 was 
greater for coal than for coke 

Instead of being interested in two series, we may wish to compare the 
undulations of a single series which fluctuated around relatively small 
values during one period and around decidedly larger values at another 
time. For example, commercial failures were around $100, OCX), 000 to 
$200.000.0p0 annually from 1895 to 1910. From 1921 to 1933 they 
ranged from $400,000,000 to $933,000,000. In the early 195()’s, they 



Chart 5.13. PrcMliiclion <if Riluniinous Coal ami ^'okc, 1935 1953, Data 
from sources given for Chart 5.12. 

were lower again. The semi-loganthmic chart enables us to study the 
relative seventy of the fluctuations during such difTerent periods. 

Showing ratios. Chart 5 14 shows how ratios may be presented on 
.the semi-logarithmic chart. The two series plotted are the price per 
bushel received by farmers for corn, and the price per 100 pounds received 
by farmers for hogs. When corn is bringing a price which is low in rela- 
tion to the price of hogs, farmers will generally find it profitable to feed 
corn to hogs rather than to sell the corn for cash. <)n the other hand, 
when corn is bringing a price wiiicii is high in relation to that of hogs, 
farmers will tend to sell corn for cash. If 100 pounds of hogs brings the 
farmer about 13 times as much as a bushel of corn, it is largely immaterial 
to the farmer w'hether he sells his corn for cash or feeds the corn to hia 
hogs.® For this reason the two scales of Chart 5.14 have been placed in a 

*See page 145, where the hog-corn ratio is discussed. 
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13>to*l ratio.* The chart not otdy shows the fluctuations in the price of 
hogs and the price of corn, but also makes it easy to see when the price of 
100 pounds of hogs is more than, less than, or exactly 13 times the price of 
a bushel of com. When 100 pounds of hogs is selling for more than 13 
times as much as a bushel of corn, the curve for hogs is above the curve 
for corn, hogs are relatively valuable, and farmers tend to feed corn to 



Chart 5.14. Average Farm Prices of Corn, per Bushel, and of Hogs, per 
Hundred Pounds, January, 1948r-December 1952. The supplementary scale 
enables us to read jthe ratio of hog prices to corn prices for any month. The value 
13 is placed opposite the line for corn and the value opposite the hog line gives the 
ratio of the hog price, per hundred pounds, to the corn price, per bushel. For 
March 1952, the ratio is shown to be slightly more than 10, which may be verified 
by referring to Chart 5.15. The supplementary scale is graduated in the same 
manner as is the scale at the right of the chart, the figure 13 being placed opposite 
the corn line because the scale for hog prices has values which are 13 times the 
corresponding values on the scale for corn prices. Data from U. S, Department of 
Agriculture, IVoduction and Marketing Administration, Market Livestock 

Branchy Statistical Bulletin No, 118, November 1952, p. 40, and Bureau of Agri- 
cultural Economics, Statistical Survey ^ December 1951-February 1953. 

tbeir hogs. When 100 pounds of hogs is selling for less than 13 times as 
much as a bushel of corn, the curve for hogs is below that for corn, corn 
is relatively valuable, and farmers tend to sell corn for cash. When the 
two curves are parallel, the ratio is remaining constant; when the corn- 
price curve is sloping upward more rapidly (or downward less rapidly) 
than the hog-price curve, corn is becoming more valuable in relation to 
hogs; when the corn-price curve is sloping upward less rapidly (or down- 

' The scale for hog prices is awkward bu. is unavoidable in this instance. 
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ward more rapidly) than the hog-price curve, com is becoming less 
valuable in relation to hogs. The supplementary scale, which is a 
separate piece qf paper and which is shown on the chart, enables the reader 
to measure the ratio between the two price curves at any time. 

Chart 5.15 illustrates another method of showing the relationship 
between hog and corn prices. Here the ratio of hog prices to com prices 
has been pomputed for each month and plotted on an arithmetic grid. 


HOC - CORN 
RATIO 



Chart 5.15. Hog-Corn Ratio, January 1948 -December 1952, The 
ratio is obtained by dividing the average farm price of hogs per hundred 
pounds by the average price of corn per bushel; the ratio is the number of 
bushels of corn required to buy a hundred pound? of uve hogs at the prices 
quoted. Data from U. S. Department of Agric\i‘' ire, IVoduction and 
Marketing Administration, Market News, Livestock hianch, Statistical Bul- 
letin No, 118, November 1952, p. 39, and Bureau of Agricultural Economics, 

Crop Reporting Board, Agricultural Prices, June od, 1952- January 30. 1953. 

The ratio may be studied without the use of a supplementary scale, but 
changes in corn prices and in hog prices are not shown, 
x.j Interpolation and extrapolation. While an interpolation on an 
* arithmetic chart is an arithmetic interpolation, an interpolation on the 
semi-logarithmic chart is a logarithmic interpolation. Thus, if we refer 
to Chart 5.6 and graphically interpolate for the Y value midway between 
T950 and 1951, we obtain about 790, which is approximately the same 
figure that we get if we use (log 6^0 -b log 972) 2 and take the anti- 

logarithm of the result. 

Extrapolation consists of extending the curve at one end or the other. 
When we extend a curve to estimate for later years than those for which 
we have data, we are forecasting. This application of the semi-loga- 
rithmic chart is definitely of questionable value if it involves only the 
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extension of a curve which has indicated in the past that the data exhibit 
a fairly constant rate of increase. Any forecasting procedure which 
involves merely the continuation of a curve or the automatic application 
of a formula, without at the same time requiring a careful consideration 
of underlying and modif^nng factors, is hardly to be depended upon, 
particularly if economic conditions are in a state of flux. The curve of 
Chart 5.16 shows the population of the East South Central division of 


rOPULAKON 
IN THOUSANDS 



Chart 5.16, Population o'f the States in the East South Central 
Diviaion of the;. United States, 1800 -1950, and a Hough Estimate for 
1960. dubious appU<’ation of the seraidoKarithmic chart. The states 
inrludecl in the East South Central Division are*: Alabama, Kentucky, Mis- 
sissippi, and Tennessee Data from U, S, Bureau of the Census, f/. 
Cenfiu^ of Poj>uhiho7i, VoL I, Sumher of InhabtlantSf pp. 1-8 and 1-9. 


the United States from 1800 to 1950. Although the extension of the 
curve indicates a possible e.stimate for I960, it should be realized that any 
estimate of population in 1960 bused only on a knowledge of the preceding 
censuses can have little validity. Ignored have been such considerations 
as: movements of industry to (or from) the division, possible increase in 
population in the divi.sion because of decentralization of cities located 
elsewhere, continued movement of Negroes from the division or a reversal 
of that movement, and other factors.* 

Now that the reader is aware of the nature and uses of the semi- 
logarithmic chart, he may note the occasional presentation of arithmetic 

^ The problems involved in forecasting population arc discu.ssed in Better Popu- 
lation Forecasting for Areas and Communities'* by Van Heuren Stanbery, issued by 
the U. S. Department of Commerce. 
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charts in books, articles, or reports when semi-logarithmic charts would 
have been more suitable. The reverse mistake is rarely made. Each 
type of chart serves a useful, but quite different, purpose. The arith 
metic chart should be used when absolute comparisons are desired 
(Charts 5.10 and 5.15 are absolute comparisons of ratios); the semi- 
logarithmic chart should be employed when relative comparisons are 
called for. . 

CONSTRUCTION OF LOGARITILMIC SCALES 

One logarithmic cycle will accommodate a tenfold increase; two cycles 
make provision for a hundredfold increase. Reference to the various 
charts included in this chapter will show that no vertical logarithmic scale 
(other than those shown in Chart 5.7) extends ov^er more than two cycles. 
Two-cycle semi-logarithmic paper will suffice for most series which the 
chart maker is likely to encounter; rarely will he need paper covering 
mor^ viicbt* I bree cycles, since it allows for a thousandfold increase. Even 
in cases where a series of very small magnitude must be compared with 
one of very large magnitude, a number of cycles is not needed, since it is 
desirable to use two vertical scales to bring the two curves together for 
comparison, as in Charts 5.9 and 5.13. Many sorts of ready-ruled semi- 
loga/ithmic paper are available from various sources. If, how^ever, only 
two-cycle paper is available and paper having more cycles is needed, it is 
merely necessary to trim the lower margin from a sheet of two-cycle paper 
and paste it above another sheet. 

At times it may be desirable to use one- or two cycle paper, but with a 
larger- or smaller-sizc cycle than those which are i \ edily available. Using 
an ordinary sheet of seini-logarithmic paper and placing a sheet of plain 
paper diagonally on top of it, a logarithmic scale may be expanded as 
shown in Chart 5. 17. A logarithmic scale ma}^ be contracted by placing 
a sheet of semi-logarithmic paper diagonally on a piece of plain paper and 
ruling horizontal lines, as shown in Chart 5.18. For those who have 
• frequent occasion to use logarithmic scales of varying size, a device such as 
that shown in Chart 5.19 is useful.® The original of this chart provides 
a logarithmic cycle varying from U inches to 12 inches. Of course, any 
number of cycles may be built up on tup of one another. 

In case no suitable logarithmic p'^oer and no logarit^imic scales of any 
sort are available, it is possible to i.onstruct a logarithmic scale of any 
desired rize by referring to a table of logarithms. With scale values 
spaced in proportion to the differences between their logarithms, a scale 

• Designed by Harriet Edmunds, of The Chartmakers, Inc., 480 Lexington Ave., 
New York, N. Y. 
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Chart 5.17. A Method of Expanding a Logarithmic Scale 



Chart. 5.18. A Method of Contracting a J/Ogaritlimic Scah* 
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may be constructed in terms of any convenient unit. From the figures 
shown below it is seen that the distance from 1 to 2 would be 0.301030 
units, the distance from 2 to 3 would be 0. 176091 units, and so on. Inter- 
mediate values are located similarly. 


\le value 

Logarithm 

Difference 

1 

0 


2 

0 301030 

0 301030 

3 

0 477121 

0 176091 

4 

0 C02060 

0 124939 

5 

0 698070 

0 096910 

6 

0 778151 

0 079181 

7 

0 845098 

0 066947 

8 

0 903090 

0 057992 

9 

0 954213 

0 051153 

10 

1 000000 

0 045757 

20 

1 301030 

0 301030 

30 

1 477121 

0 176091 

40 

1 602060 

0 124939 

50 

1 698970 

0 096910 

60 

1 778151 

0 079181 

70 

1 845098 

0 006917 

80 

1 903090 

0 057992 

90 

1 954243 

0 051153 

100 

2 000000 

0 015757 


The usefulness of logarithmic scales is not limited to the applies tions 
shown in this chapter. In Chapter 23 we shall make use of a horizontal 
logarithmic scale and an arithmetic vertical scale. In Chapter 20 we 
shall use logarithmic scales on both the horizontal and vertical axes. 



CHAPTER 6 


■ Graphic Presentation III: 

OTHER TYPES OF CHARTS 


A number of other graphic devices, in addition to curves, are available 
for presenting statistical information. In this chapter we shall give 
brief attention to bar charts, pie diagrams, pictographs, and statistical 
maps. 


BASES OF COMPARISON 

Chart 6.1 shows how the number of tractors on farms may be compared 
by means of three types of diagrams: (A), a bar chart involving one- 
dimensional comparisons; (B) and (C), circles and squares, involving 
two-dimensional comparisons; and (D), a three-dimensional comparison 
represented by tractors of varying sizes. Readers of charts obtain most 
accurate impressions of the magnitudes shown when data are represented 
by means of bar charts, and least accurate impifc^sions when data are 
represented by volume diagrams. Area diagrams are more accurately 
judged than volume diagrams, but less accurat^'ly than bar charts.^ It 
should also be remembered that volume diagrams shown on the printed 
page make it necessary for the reader to visualize the third dimension 
before making his comparison. Another disadvantage of charts using 
laquares, circles, or pictures of different sizes is that the reader may be 
uncertain whether to compare heights, areas, or volumes. In any event, 
the basis upon which the diagram was drawn should be indicated. If it 
is argued that the correct basis of comparing the size of such objects as 
tractors is the apparent weight of ^he different tractor and if the chart 
maker has drawn the tractors so that the number of tractors in different 
years is shown by the height or length of the tractors, as is sometimes 

* See Graphic Comparisons by Bars, Squares, Circles, and Cubes,*' by Frederick 
E. Croxton and Harold Stein, Journal of the American Statistical Association, March 
1932, pp. 64-60. 
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1930 1940 1945 1950 1953 

A 



1930 1940 1945 1950 1953 


8 



1930 1940 1945 1950 1953 


C 


<n <n<n<n<rt 

1930 1940 1945 1950 >953 

0 

Chart 6.1. Number of Tractors on Farmsin the linitcd States^ 
1930, 1940, 1945, 1>S0, and 1953. The data are represented by (A) 
bars, (B) circles, (C) squares, and (D) pictures of tractors. Part A 
involves linear comparisons; parts B and C require comparisons of 
areas ; part D calls for compari.sons of volumes. Data from AgriaMural 
SkUtttia, I95i, p, 631, and 196S, p. 660. 
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done, then the reader who judges the sizes upon the basis of apparent 
weight (essentially volume) will get an exaggerated impression of the 
variation in number of tractors during the different years. 

Charts involving volume comparisons appear all too often in news- 
papers and magazines. Later in this chapter we shall see how it is pos- 
sible, by means of pictographs, to obtain the attention-getting value of 
pictures and at the same time get visual impressions as accurate as may 
be had from bar charts. 


BAR CHARTS 

The bar chart shown in section A of Chart 6.1 is a simplified form using 
no scale. In Chart 6.2 the same data are shown by means of a bar chart 

THOUSANDS 



1930 1940 1945 '9j0 1953 


Chart 6,2. Numlicr of Tractors on Inarms in the 
United States, 1930, 1940, 1945, 1950, and 1953. Data 
from sources given below Chart 6.1. 

which has a scale and which also varies the spacing between the bars in 
order to call attention to the fact that the time intervals vary. When the 
chart is expected merely to convey a very general impression, simple bar 
charts may be drawn without the use of a scale, as in section A of Chart 
6.1. However, when two (or more) bar charts using different scales are 
in juxtaposition and may be compared with each other, the scales should 
be shown. Another caution conc<"rns the presence of zero on the scale; 
Chart 6.3, which lacks the zero, shows that the omission of the zero is just 
as misleading in this type of chart as in the case of arithmetic curves. 

All of the preceding bar charts showed chronological data, and, follow- 
ing the customary procedure, the bars were arranged vertically. Vertical 
bars should also be used for data classified quantitatively, for example, 
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Chart 6.3. A Bar Chart Lackinn^ a Zero on the 
Vertical Scale, From National Board of Fire Under- 
writers, Fne Insurance Fads and Tren/Ls^ Au^^ust 1953. 

millions of acres 

0 30 60 9 ( 


CORN 


WHEAT 


OATS 


barley 


RYE 


I 

81,359.000 

1 


r— 

70,585,000 

1 

•8,264,000 
* 1,385,000 

38,6>*3,000 

• 

i 



Chart 6.4. Acreage Harvested in the United Stales of Corn, 
Wheal, Oats, Barley, and Rye, 1952. Data from AfinndtuTal 
StaiistKs, lOnS, pp. 1, 16, 31, 41, and 47. The aereagi^s given for oats, 
barley, and rye are the acreages harve.sted for grain. 


data of the number of persons in the United States classified by age groups 
or according to years of schooling. When making comparisons of data 
classified qualitatively or geographically, on the other hand, horizontal 
bars are generally used. Chart 6.4 shows such a comparison of the 
acreage harvCvSted of each of five crops in 1952. 
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Proportion of Day and Night Operation in 93 Bakeries, 
For 7.569 Bake Shop Employees, in 28 States. 1934 



tmm Lmr Ow tmrtmMit 

* yrnrim Imrt juu Ovmmtt •*«« /• imr OmmWm 


Chart 6,5. An Application of the Bar Chart. From United States 
I3urr.i»’ of Labor Statistics, Wages^ Hours.j and Working Conditions in the 
Bread'-Baking Indnsiry^ I934t Bulletin No. 628, p. 75. 


MILLIONS OF 
OF PERSONS 



leao IB 90 1900 1910 1920 1930 1940 1950 

I NATIVE BORN FOREIGN BORN 


Chart 6.6. Native-Born and Foreign-Born Population of the United 
States, 1880-1950. The relative growth of the two series is not apparent from 
this type of chart, but may be shown by means of a seini-logarithmic chart, as 
described in the preceding chapter. Because of the nonexistence of zero on a 
logarithmic scale, curves would be used instead of bars. Data from Statistical 
Abstract of the United Stales, 1952, p. 81, and from U, S. Census of Population j 
1950, Vol. II, Part 1, Chapter B, p. 1-87, and Vol. IV, Part 3, Chapter B, p, 
3B-82. 
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There are no set rules to be observed in drawing bar charts. Certain 
considerations, however, are helpful. 

G) Individual bars should be neither exceedingly short and wide nor 
very long and narrow. 

(2) Bars should be separated by spaces which are not less than about g 
the width of a bar or greater than about the width of a bar. 

MILLIONS OF ACRES 

30 60 90 



Chart 6.7. Acreage Harvested in the I'nitcd States 
of Corn, Wheat, Oats, Barley, and Bye, 19i0 and 1952. r 

Data from source given below Chart 6,1, 

(3) A scale is generally useful. It should be about i the width of a bar 
from the top bar (or from the left bar, if the bars are vertical). 

(4) Guide lines are an aid in reading the chart. Sometimes the chart 
is enclosed and the guide lines are extended through the entire chart, as 
in Chart 6.4; sometimes the chart is not enclosed and the guide linei are 
cut off, as in Chart 6,7. 

When showing a time series graphically, we may use either a bar chart 
or a curve. A curve facilitates a study of the general change which has 
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taken place in a series ^whereas a bar chart enables comparisons of specific 
years to be made more readily. If the series covers many years, it is 
generally not desirable to use a bar chart, which is laborious to construct. 
When only a few years are shown, as in Chart 0.2, a bar chart is preferable. 

Chart 6.5 shows an interesting application of the principle of the bar 
chart. It indicates for each of 93 bakeries the proportion of day and night 
operation during a year. The advantage of this chart is that it shows the 

PEKCENT chance, 1952-53 
-5 0 5 10 15 



Chart 6.8. Percentage of Increase or Decrease in 
Planned Plant and Equipment Outlays for 19-53 as Com- 
pared with Investment for 19.52, for Six Industry 
Groups. From Survey of C^irrent Biisinet'ii, April 1953, p. 1. 

information for each of the 93 concerns in a more compact form than could 
well be done otherwise. 

Sometimes we wish to compare two sets of data over a period of several 
years. This may be done by means of a two-unit bar chart, as shown in 
Chart 6.6. Similarly, we may wish to compare several categories for 
two years; a comparison of this nature is shown in Chart 6.7. 

A two-direction bar chart, such as Chart 6.8, may be used to show 
increases and decreases. This type of chart is even more effective if 
increases can be shown in black and decreases in red. Increases and 
decreases in a series of data for a nnmbef of years ma> be shown by means 
of vertical bars above and below a horizontal zero line. 

PICTOGRAPHS 

In section D of Chart 6.1 the number of tractors on farms at each of 
certain years was represented by means of pictures of tractors of varying 
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size. While this sort of chart does not convey a satisfactory comparison 
to a reader, it does attract attention. The pictorial effect may be 
retained and a satisfactory visual comparison afforded by usings a number 
of am all pictures, all of the same size, and arranging them so as to form 

EACH StMSOL REmSCNTS 1.000.000 TPACTORfl. 

Chart* 6.9. Number of Tractors 
on Farms in the I3nile<l States, 

1930, 1940, 194.'), 1950, and 1953. 

Data from Agricultural StatisticSj 1962, 
p. 631, and 1963, p. 560. The tra(‘tor 
was designed by Pictorial Statistic's Co 



Chart 6.10. A Pictograph Used by Hobart and 
William Smith College. From LeVs Look at Hobart 
and William Smith, p. 14. The original was in two 
colors. 

a bar chart. Such a graph is referred to as a pictograph. Chart 6.9 
shows a comparison of tractors on farms by means of this device. While 
the diagram is essentially a bar chart, it is more attractive and thus is 
more likely to be examined by a reader. No scale is used, but since the 
pictures are all of the same size and since each represents one million 
tractors, approximate numerical values may be had from the chart, if they 
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are wanted. Although a bar chart of a time series generally uses vertical 
bars, it will be observed that the pictograph shown as Chart 6.9 has hori- 
zontal bars. Pictographs are often arranged in this way because it seems 
more suitable to have tractors, people, houses (or whatever is being pic- 
tured) standing side by side rather than on top of one another. 
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Chart 6.11. A Modified Pictograph. From Health 
Insurance Council, Accident and Health (Coverage in the 
United Staten, September 1953, p. 21. 

Chart 6.10, another example of a pictograph, is an interesting method 
of showing that campaigns for funds are apt to depend heavily upon 
relatively few large gifts. Chart 6.11 represents a slightly different 
application of the pictograph idea. Here, bars and a scale are used, but 
pictures are superimposed upon the bars. (Use was also made of a 
picture in Chart 0.3. which was not a pictograph.) It should be apparent 
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that, in making a pictograph, the picture is so chosen as to suggest the 
nature of the data being shown. Certain basic rules for the use of pic- 
torial devices are shown in Chart 6.12. 


SrMSOLS SHOULD U SELf-EXFlANATORY 

A isc 


CHANCES IN NUMBERS ARC SHOWN 
BY MORE OR fEWER SYMBOLS 



Each 1 muuOn TOf^S 


NOT BY larger or SMAllll ONES 


charts G!VE AN OVER.AU PICTURE 



NOT minute details 

* 87J.285 
11.075 T57 
?0 468 


PICTOGRAPHS make comparisons not flat statements 

tiioitfCMB 

i900M[iiKMK 

tnoMMKMKiCX i9so«KhKmSmK 

Chart 6.12. The Basic Rules for Drawtnfi^ Picto- 
graphs as Suggested by Modle\ and Lowenstein. 

From Rudolph Modle^y and Dyno Lowenstein, Firto- 
graphs and Graphs j Harper and Brotherb, New York, 1052, 
pp. 25 and 26. 

COMPONENT-PART CHARTS 

The parts of a total may be shown by means of a bar as in Chart 6.13 
or by a pie diagram as -in Chart 6.14. The bar chart involves a one- 
dimensional comparison of the lengths of the sections of the bar; whereas 
the pie diagram involves a two-dimensional comparison of the pie sections, 
or a one-dimensional comparison of the arcs of the pie sections, or a com- 
parison of the central angles. Accuracy of judgment is about the same 
whether based on a bar chart or a pie diagram,^ with the exception that, 


* See ‘Tlar Charts Versus Circle Diagrams,” by Frederick K. Croxton and Roy E. 
Stryker, Journal of (he American Statistical Association, December 1927, pp. 47!L*482. 
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when depicted by a pie diagram, 25-per-cent (shown by a right angle) and 
50-per-cent (shown by a diameter) sections are more accurately gauged. 
The pictorial value of the pic diagram is perhaps greater than that of the 
bar chart, and it is increased when the pie diagram is designed to suggest 
a silver dollar. Chart 6.15 shows an application of this sort. A single 
component-part bar is occasionally drawn 


without a scaje and is sometimes horizon- 
tal. One advantage of the vertical bar over 
either the horizontal bar or the pie diagram 
is that the sections of the vertical bar are 
easier to label. 

Several suppliers of graph paper offer 
sheets showing a circle with the circumfer- 
ence graduated from 0 to 100, thus enabling 
one to construct pie diagrams readily. If 
such sheet'’ are not available or if varying 
sizes of circles are desired, pie diagrams 
may be made by the use of compasses and 
a protractor. Since the conventional pro- 
tractor divides a circle into 360 parts or 
degrees, the percentages which are to oc 
shown should be multi})lied by 3.6. Divid- 
ing a circle into percentages is facilitated 
by use of a protractor^ calibrated to divide 
a circle into 100 parts, as shown in Chart 
6. 16 ; such a scale may be engraved or oUuu- 
wise marked on the back of an ordinary 


PER CEMT 



protractor. 

Chart 6.17 shows how bar charts may be 
used to compare several sets of c>om|>onent 
parts and also how the same comparisons 
m^y be made by means of pic diagrams. 
It seems clear that c.omparisons between 
the years are made more easily from tlie 
bars than from the circles. The guide lines 


Chart 6.13. Pro|>ortion of 
the Population of the I'liiled 
Slates in Kach SiMieifiCil Age 
(ffoup, 1950. Data from U. S. 
Hwreau of the Census, U, S. 
Census of Populaiion, 1950, Vol. 
II, Charac> eristics of the Popula- 
tion, Part I, United States Sum- 
mary, p. 1-93. 


running from section to section assist in making comparisons from the bar 
chart; when the lines are parallel, theiv has been no change; when they 
diverge, there has been an increase; when they converge, a decrease has 


occurred. 


The comparison of component parts in Chart 6.17 is on a relative basis; 


3 See Percentage Protractor,” by Frederick E. Croxton, Journal of the American 
Statistical Association, March 1022, pp. lOS-lOO. 





Chart 6.1i. Proportion of the Population of the 
United Slates in Kach Specified Age Group, 1950. 
Data from source given below Chart 6.1H. 



Chart 6.15. Pie Diagrams Used in C'onnretion with the President’s Budget 

Message for 1955. 

11:8 
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Chart 6,16. Percentage Protractor. 







1910 


1920 


1930 1940 1950 



C.hart 6,17. Proporlion of the Population of the United States in Kach 
Specified Age Group, 1910 -1950, Data from sources given below Chart 4.23. 

the proportion of each age group in me population is shown. When we 
indicate how many of each age group 'were enumerated, we have diagrams 
such as are shown in Chart 6.18. The bars and circles vary in size 
btjcause the total population has increased. In this instance the bar 
chart is clearly preferable to the pie diagram. When data such as those 
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shown in Charts 6.17 and 6.18 cover a number of years, it is generally 
preferable to make use of curves, as was done in Charts 4.23 and 4.24. 
While the bar charts of Charts 0.17 and 6.18 present (chronological data, 
we may also compare component parts for different places or (categories. 


MILLiOMS or 
Of PERSONS 
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1910 1920 J 930 1940 1950 




so and over 


40-59 


20-39 


UNDER SO 



1910 1920 1930 1940 1950 

Churt 6. IB. Population of th«‘ United Slates in Kach Speeined Age Group,, 
1910 1950. Data from sources given below Chart 


For example, we miglit compare the proportions of inaltss and females in 
the urban population with the proportions of males and females in the 
rural population. One bar, subdivided for males and bunales, would rep- 
resent the urban pojnilalion; the other bar, similarly divided for the sexos, 
would represent the rural population. 


STATISTICAL MAPS 

Statistical maps are graphic devitces which show quantitative informa- 
tion on a geographical ba.sis. We shall consider hatched or shaded maps, 
dot maps, and pin maps. 
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Hatched maps. Hatched or shaded mapvS undertake to show for each 
geographical area under consideration the magnitude of the phenomenon 
which is being studie.d. The variations in magnitude are represented 
grapliically by progressive differences in hatching or shadijig. In Chart 
0.19 the various hat(;hing.s indicato the ** levels of living” of farm-opera- 
tor families in the counties of the Cnited States in 1950. The counties 
having the Jiighest levels of living are shown in solid black, and the 



('.hart 6.19. A Halrhcd Map. 


hatching becomes progressively lighter so tliat the lightest indicates the 
counties which liad the lowest levels of living. Tlic outstanding char- 
acteristic of maps such as this is that a progressive change in the hatch- 
ing or shading indicates an increase (or decrease) in the phenomenon 
being measured. 

Sometimes statistical maps are made , colors. However, the principle 
of progressive shading cannot be developed satisfactorily by using dif- 
ferent colors. It is possible, of course, to use progressive shades of a sin- 
gle color and thus sometimes to produce a more attractive map than 
could be done by using black and white. 

Dot maps. The preceding statistical map showed data that applied 
to entire areas— specifically, the average level of living for counties - and 
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SO a hatched or shaded map was appropriate. When the geographical 
distribution of occurrences is to be shown, the dot map should be used. 
Chart 6.20 shows one of the simplest of dot maps, ilach dot represents 
500 farms, and the concentration in variou.s parts of the country is clearly 
shown. In a dot map, the number of units roprcscuited by one dot may 
be large, as in Chart (3.20, so that the juimber of dots in a region is small 
enough to be counted, or the number of uniis represented by one dot 
may be small, so that the numerous dots gi\'e the effect- of a gradual 



change in intensity of shading from light to dark. Which technique to 
use depends on the purpose of the chart. 

A different sort of dot map is .shown in CUiart 0.21, which uses dots of 
varying size. In this study, 4,030 truck drivers were stopped at various 
places and were asked how long they had been driving and certain other 
correlative questions. The areas of the circles indicate the relative num- 
ber of drivers que.stioned at each point. While the varying circle sizes 
indicate clearly that more drivers were quizzed at certain places than at 
others, it is not easy to make accurate comparisons from these dots. We 
cannot compare diameters directly. We mu.st remember that, if one cir- 
cle has a diameter twice as great as another, then the first circle has an 
area four times that of the second. 

Pin maps. Pin maps may be thought of as a particularly flexible sort 
of dot map. They consist of maps mounted on a backing of cork, card- 





lJuiri 6.21. Nuinlicr of Drivers liiterviewccl and Location of 
In!rr\iew in u SijkIv of Driving Practices of Truckers. Reproduced 
fium Nutional \ How Long on ihe Ihi/kivay, 193C, p. 19. Note 

that fiv(' of the elates are not identified. 
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Chart 6-22. An Autoinoliilc Aeci<Iciil Pin-Map of the City of Syracuse) New 
York. F roni National Safety Council, ChicjiRo, Illinois. 
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board, wajlboard, corrugated cardboard, or the like, on which informa- 
tion is recorded by means of pins having (usually) glass heads of difTer- 
cnt size, color, and shape. The available pins have heads that range in 
size from about ^ inch to about 1 inch in diameter. A large number of 
colors is available as well as a variety of shapes, such as round-, square-, 
and triangular-head pins. Pin maps may be readily altered as the facts 



■ MEOICAl COVERAGE ■ POPULATION 
(000 OMITHOI m (000 OMiHED) 

Chart 6.23. Map nilh Superimposed Bar Charts. FroK) 1 health Insiiranrc 
Council, Accident and Health ('overage in the L’rn'Ud Stulefi^ September 1953, j) 17. 

change. Because of this flexibility and the wide variety of pins avail- 
able, the pin map is frequently employed as a method of presenting geo- 
graphical data. An extensive pin-map scheme, involving one or more 
maps mounted on cork and hundreds or thousands of pins, is exi)ensive 
but maj'' often prove very useful. 

Chart 6.22 shows a pin map used to record the location and result of 
automobile accidents. By using one or more such maps, it is possible 
not only to observe the fre(iueru*y with which a(?eidents occur at various 
'places, but also the nature of each accident (automobile hitting pedes- 
trian, automobile hitting automobile, automobile hitting fixed object, and 
so forth) and the re.sult of the accident (property damage, occupant 
injured, occupant killed, pedestrain injured, pedestrain killed, and so on). 

One difficulty with the statistical map is that the importance of differ- 
ent regions is not to be judged by their areas. For instance, a hatched 
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map showing income per family in different states would be somewhat 
misleading because there are many more families in some of the states 
occupying very small areas than there are in other states occupying very 
large areas. An interesting device sometimes used for overcoming this 
difficulty consists of drawing the map in siuth a Avay that the area of each 
state is in proportion to the number of families in that state. 

Occasionally a map and some other type of chart are used in combina- 
tion. Chart 6.23 shows a map on which four simple bar charts have 
been superimposed. The original of the map had the bars for hospital, 
medical, and surgical c*overagc in red and the bars for population in 
black. With the geographical areas separated, the reader may visualize 
exactly what territory is referred to in each instance. 



CHAPTER 7 


Rates, Ratios, and Percentages 


It was pointed out in the eh:ipt(‘r dealiiig with statistical tables that 
derived figures are useful to assist ui sumintirizing and comparing data. 
In that chapter specific mention was made of rates/ ratios, percentages, 
and averages. Thi.s chapter will discuss rates, ratios, and percentages. 
Averages and related measures will be examinecl m later chapters. 

To express the ratio wdiich 753 bears to 251, we divide 753 by 251, 
Avhich gives 3, and we say that 753 is to 251 as 3 is to 1, or more briefly, 
753:251: :3: 1. We have thus indicated the n^Iationship which the first 
of these two numbers bears to the second as a rafio to one. If it suited 
our purpose better, we could express the relationship as a ratio to any 
other number. P^or example, we could use a ratio to ten, saying 753: 
251: :30:10; we could use a ratio to one hundred and write 753 *251 : : 
300: 100. ThivS last ratio, per hundred, is generally referred to as a jjrr- 
cenlage, and we' note that 753 i.^ 300 per cent (from per centum) of 251. 
It will thus be seen that percentag("', wdiich are used so frecpiently, are 
merely special cases of the more general coma^pt of ration. If, instead 
of using a ratio per hu mired, we find occ-asiou for a ratio per thousand, 
we may refer to our figures as ‘^per milh‘.’^ 

Ilatio.s are computed in order to expedite comparisons. Not only an^ 
large numbers reduced as in 'I'able 3,~l, but much is gained by comparing a 
.series of figures with a rounded base of 100 (wlfieh can be <‘arried in one’s 
mind) rather than by attempting to comj)are eacli individual poiailalion 
figure with the total for the entire Ibiited States. Relative change ina\' 
be visualized more concretely when showm by percentages, as in Table 
7.1, or when shown by one of tlie method.s used in Table 7.2. 

’ The ternj raO* is soinctimc.s to mean the amount or quantity of one variable 
{'onsitlored m relation to one unit of a different vari.able. Thus, 20 miles per hour 
i.'i a rate of speed. The relationship that two similar variables bear to each other is 
often termed a ratio. I'or example, the current ratio, whieh is the ratio of current 
.H.s.sets to current liabilities, eomp.sre.^* t\\ i figures which are both in terms of dollars. 
General usage does not always observe this distinction Ix'tween rate and ratio. 

1*16 
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TABLE 7.1 

Acres Harvested of Selected Crains in the United States^ 

1951 and 1952 

Ora in 

C(»rTi 

Oats . . 

Barley . 

Kico 
Hyc 

Bncku'ln'ot 

♦ A inirms 'le:u)t“s n ,ii*<rca->p. 

Data Oof* }{f«i>i)rMiiK Hoard. I Hiiroau of A»?ru ultural Kconoiuics, 

Trap ]'^ oiiu-iioii, Marth 10. I0"d. p. 11 

TABLE 7.2 


Froditrtion of Steel limots anti Steel for Castings in the United 
States, Vm~VJ52 
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27 5 
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1919 

77.078 

87 

8 ; -12 2 

88 0 I 

-12 0 

1950 

06,880 

IU9 

0 1 0 0 

1 » ' ‘j j 

2 4 2 

1951 

U)f>. 200 

' 1 18 

4 ' 18 1 ! 

1.‘^ 6 

8 6 

1952 

08.1561 ' 

i(M 

9 , 4 9 j 

88 6 

-11 4 


• \ minus "ii-'n dpimfr-^ a dto-ii a'-»- 
X I’r^ Inun.ir}, . 

Datii ' anous I'^su**'- of th*‘ Sm-vry >/ Cutrenl Itu,'nrn‘8S. 


CALCILATION 

When one or more iniinlans :ire being compared to another number, the 
figure to whi(‘h (‘omparisons are made Ls known as the basf\ A ratio is 
found by dividing- the figure, wliitdi is being compared to the base, by 
the base. The figure is then expressed in terms of or in relation to the 
base, and ratios of all sorts are thenefore sometimes referred to as relative 
numbers or relafivps. 

The amount of money in circulation in the United States on June 30, 

2 Instruct ioiD. for ofK'raling caloulaling machines jiiny be obtained from the sales 
offices of the calculating machine companies. 
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1943, was $17,421,201,974. On June 30, 1952, the circulating medium 
totaled $29,025,925,276. ro state the 1952 circulation in terms of the 
1943 circulation (the base), we divide $29,025,925,270 by $17,421,201,974 
and obtain 1.6001. This means that the money in circulation in 1952 
was 1 .0001 times as great as in 1943. In many instances, ratios are most 
useful when stated as percentages. To change 1.000, the ratio to one, 
to a ratio per hundred, the decimal point is moved two places to the 
right; the resulting figure, 100. 0, indicates that money in circulation in 
1952 amounted to 106.6 per cent of the amount in cinmlation in 1943. 

It should be noticed that there are two ways in which we can express 
the percentage figure just given. Instead of saying that 1052 circula- 
tion was 106.0 per cent of 1943 circulation, we may say that circulation 
in 1952 was 60.6 per cent greater than, in 1943. In the first instance, we 
compared the figures for the tw'o years; in the .'-cf'ond, we (‘ompared the 
change v/hich took place^ with the figure for 1943. 

EFFF.CT OF CnAN(;iNG RASE 

Naturally, a different set of percentages would be obtained if we com- 
pared the 1943 circulation figure with the 1952 figure. We arc lunv 
using 1952 as the base and the 1943 figure is divided by that for 1952. 
Performing this operation indicates that cinailation in UM3 was OO.Q per 
cent of that in 1952, or that circulation in 1943 was 40.0 per cent less 
than that in 1952. Observe that, while the 1952 figure was 66.6 per cent 
greater than the 1943 figure. (1943 was the base), the 1943 figure was 
40.0 per cent less than the 1952 figure (1952 Ava.s the base). This differ- 
ence is, of course, due to the fact that the basis of comparison wars first 
in reference to 1943, then to 1952. If a number is increased 100 pei' 
cent, the second number need be decreased but 50 per cent to arrive at 
the original figure. C'onver^ely, if a given number is decreased 50 per 
cent, the second number must bo increased 100 per cent to reproduce the 
given number. 

The failure to realize the effect of this change of base may lead to the, 
drawing of false conclusions. A firm decreased the wages of its employees 
15 per cent: later it increa.sed the reduced wages 5 per cent; then it raised 
these increased figures 5 per cent; and finally it increased these second 
figures ariother 5 per cent. Afterwards it announced that the three 

* Suppose we are comparing two percentages, a.s 4.0 per cent and 9.0 per cent. We 
may speak in absoliito Icrms and .say that 9.0 per cent is 5.0 per cent more than 4.0 
per cent. Wo may speak in relative terms and say that 9.0 per cent is 125 per cent 
greater than 4.0 per cent, or lhat 9.0 per cent is 225 per cent of 4.0 per cent. When 
comparing percentages, it is advi.sable to make quite clear whether we are speaking in 
absolute or relative terms. 
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5 per cent increases put wages back where they were before the 15 per 
cent reduction. Calculation will show that the ncw^ wages were really 
98.4 per cent of the original w^ages before reduction. Tf the company 
had given a single 15 per cent increase of the reduced wages, the now 
Avages would have been but 97.75 per cent of the original wuiges. 

9'able 7.3 shows for selected percentages of inciease the per cent which 
the new niynl^cr must be decreased to reproduc<^ the original number. It 

rABI.K 7.3 

IllnsI r(it iitny. o/ nf Shifitfi/si itust' in tihiflng Percentages 
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should be l)onj(‘ in rnind that a prr < eiU-uf-increase Mguie may he indeli- 
nitoly large; ho\ve%'er, a por-cem -of -decrease figui’e of 100 indicates a 
(io(;lino to zero, wh'le a per cent of <ieerease of oeei 100 indicates a fall to 
.\ negntive (piantity. 


n IX:OiU)I N<; PERCENTAGES 


Generally [)er('cntages are rccordcii to one d(a’'’ial place, [f the per- 
centages are bascii upon large figures, and parlic'ui,i! ly if one, or more than 
one, part of a t<»tiil is ouite small (sec? d'able 3.4b it may be desirable to 
u.se more than one decimal, ( )<'casionaUy only whole percentages are 
shown, ill order that relationships may be grasped readily. Whole per- 
centages will not suflice, however, when the relative variations are 
extremely small. 

Percentages should not b(‘ cale\ilated if the absolute uuml)ers arc small, 
espe<naUy if the base is appreciably loss I ban 100. A serious difliculty 
arising o\it of the use of percentages l)ased on small absolute numbers is 
dis(?ussed on page 150. 

When percentages are to be n v-orded with one decimal, they are 
rounded to the nearest teiith of one per cent. The following examples 
will indicate the procanJure in rounding percentages (and also in round- 
ing other calculations^ involving remainders); 

♦ See Appendix ^ for a more com prehen si cli.scn.‘^sion of rounding numl>ers. 
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(1) $371.16 4- $679.28 0.5461, or^4.64 per cent. The second dec- 

imal is less than 5 and therefore tliis percentage, to the nearest tenth of 
one per cent, is 54.6. 

(2) 2,319 pounds 7,532 pounds — 0.3079, or 30.79 per cent. In 
this instance the second decimal i.s more than 5 and the percentage 
should be recorded as 30 S. 

(3) 280,511 feet 11,000,000 feet = ().()25o()l, or 2.55()\ per cent. 
Here the second decimal is 5, but there is a remainder which results in 
the 1 in the fourth decimal place. Recorded to the nearest tenth of one 
per cent, this hgurc is 2.6. 

(4) 1,341 barrels 6,000 barrels = 0.2235, or 22.35 per cent. Here 
the nearest tenth is either 22.3 or 22.4. It does not greatly matter 
whether occasional results such as this are raised in the first dociinal 
place or whether the second decimal is dropped. However, it is better 
to follow some coiivsistent scheme. Particularly when many computa- 
tions are being made which are eventually to ])0 added, it is well to 
employ a method which will causci half of the values with a second dia'i- 
mal of exactly 5 to be raised and half to be lowered. This practical will 
avoid the accumulation of errors. Probably tlu‘ most satisfactory scheme 
is to raise the first decimal when the first deeimal is an odd numl^er (G7 35 
becomes 67.4) and to drop the second decimal when the first decimal is 
an even number (67,65 becomes 67.6). 

Reference to the percentage data shown in the last column of Tai)le S.(> 
will reveal that the eleven pcnceniage.s add to 90.9 rather than to iOO.O. 
This is the consequence of rounding all percentages to oik^ decimal pla(‘c, 
which sometimes results in totals of 99.9 or 100.1 and occasional!}^ shows 
99.8 (as in the next-to-the-last (‘olumn of Table 8 6) or 100.2. Some 
statisticians adjUvSt one of the percentages in order to produce the cor- 
rect total (see note below Table 7.5), but it seems preferable to lot each 
percentage stand correctly rounded, as in Table 8.0. 

TYPES OF COMPARISONS 

We have already seen an instance in which the parts of a whole were 
compared to the total in Table 3.4, Here the percentages were obtained 
by dividing each item in turn by the total. More expeditiously we may 
take the reciprocal of the total and multiply the reciprocal by ea(di of the 
component figures. This is a time-saving device adapted particularly to 
the calculating machine, and is applicable whenever we arc dividing a 
series of numbers by a constant number. 

Various illustrations of comparisons of one figure with another figure 
are given on later pages in this chapter. For instance, in the paragraph 
on sex ratios it is noted that each figure for males is divided by the 
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appropriate figure for females, since the sex ratio consists in stating the 
number of males per 100 females. 

Table 7.2 indicates a number of different comparisons which may be 
made in regard to data arranged chronologically. In column 3, the pro- 
duction 01 steel ingots and steel for castings for each year is compared 
with the 10^3 production; each figure is divided by that for 1943. Col- 
umn 4 shqws the percentage by which the production for each year 
exceeded that for 1943; each year’s numerical increase or decrease over 
1943 is divided by the 1943 production. In column 5, the production 
each year is related to that of the preceding year; each year’s figure is 
divided by that for the preceding year. Column 6 indicates the per cent 
of increase or decrease of each year in relation to the preceding year; the 
numerical iiu'rease (or decrease) of each year over the preceding year is 
divided by the produrlion for the preceding yea^ In columns 3 and 4, 
comparisons are made with a fixed base, 1943. In columns 5 and 6, the 
base is constantly shifting, being alwa 3 ''s the preceding year. 

Anothei apijlication of per(;cntages is shown in Table 7.1. Here the 
1951 figure for each crop is the base. The percentage columns headed 
“per cent increase” indi(‘atc the relative increase or decrease in the acre- 
age lifirvestod of each crop from 1951 to 1952. 

SOME FREQUENTLY USED RATIOS 

The following paragraphs indicate a few interesting applications of 
ratios and [lercentages. 'fhe reader will doubtless become aware of 
many ol-liors as he reads more or less technical material in magazines, 
newspapers, hooks, and advertisements. 

Tndev iHiinbers. Most index numbers are pi ^ented in the form of 
pericntages.^ In the construction of an index number of wholesale 
prices, for example, the commodities to be iiu hided are selected first, 
and their prices are then combined with due regard to the varying impor- 
tance of the different commodities. If the index number is a clironolog- 
, ical one, as is usually the case, some year may be designated as the base 
and prices in that year are set equal to 100. The prices for the other 
years are then expressed in relation to that base year. The United 
States Bureau of Labor Statistics uses the average of the years 1947 “ 

1949 as the base year for its index numbers of approximately 2,000 
wholesale prices. Wholesale price, during these three years are there- 
fore represented by 100. The wholesale price index number for June 

1950 was 100.2; for December 1952, it was 109.6; for January 1953, it 
was 109.9; for February 1953, it fell to 109.6. Prices for these months 


* See Chapters 17 and 18 for a. more complete discussion of index numbers. 
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are thus expressed in terras of the average for the thirty-six months of 
1947-1949. 

Sex ratio. The relationship of the number of males to the number of 
females in the population is given by the sex ratio, which states the num- 
ber of males per 100 females. In 1950 Ihere were 74,833,239 males and 
75,864,122 females in the United States. There W(3re thus 08.6 males 
per 100 females in this country, Thc^ ratio varied for the different states. 

It was lowest in Massachusetts, where there were 93.8 males per 100 
females, and highest in Wyoming, where there wore 114.1 males per 100 
females. The various nativity groups in the population showed difft^r- 
ent sex ratios. Negroes had 94.3 males per 1(H) females; native Whites, 
98,6 males per l(X) females; foreign -born Whites, 103.8 males per U)0 
females; Japanese, 117.7 males per 100 females; and (Jhinese, 189.6 males 
per 100 females, 

PopuJatioii den.sity. Instead of merely comparing the total popu- 
lation of two eommunitics, it may often be more meaningful to consider 
the density of the population. We do this by dividing the total ])opula~ 
tion by the area in square miles, and thus d(Termine the number of per- 
sons per square mile. For example, in 1950 th(^ population of Mo?itana 
was 591,024 and the ijopulation of New Hampshire was 533,242. If we 
relate these figures to the land area of each state, wo find that New 
Hampshire had 59.1 persons per square jnile, while Montana l\a<{ l>tit 
4,1 persons per square mile. I'hesc figures do not, of course, mean that 
there were 59 or 60 persons on/yery square mile in New' Hampshire and 
4 or 5 persona on t^very scpiare mile in Montana. They are merely sum- 
mary' figures indicating that, on the average, there were the indicated 
number of persons per square milti in each state 

Population density raay^ also be used in making chronological compari- 
sons. As our country luisS growni older, the population density ha.‘< 
increased. In 1800 there were 6 1 persons per square mile in (lie United 
States; in 1950 there wore 50.7 persons per square mile. 

Ratios per capita. Many figures are more meaningful or more useful ^ 
when exprewSsed on a per capita l)asis. The Federal debt of the United 
States reflects not only the loA'ej of expemditures in past years and 
increases in government services, hut alsti th(.‘ growth of population. 
For example, on June 30, 1940, the Fedcjral debt was S48, 497, 000,000; 
by June 30, 1952, the figure had grown to $259,151,000,000. If these 
figures are divided by the population at the two periods, it appears that 
the per capita Federal debt was $367 on June 30, 1940, and $1,650 on 
June 30, 1952. 

The consumption of various commodities is fre(]uently staterl on a per 
capita basis. Thus in 1952 the estimated consumption of beef was 61.0 
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pounds per capita; the estimated consumption of eggs was 409 per capita; 
the approximate amount of refined sugar consumed was 95.9 pounds per 
capita. 

Death rates. The crude, gross, or gcuf^ral death rate for a given year 
is obtained by dividing the number of deaths occurring in a community 
during that year by the mid-^^ear fxjpulation of that community, and 
expressing the result per thousand. In 1952 there were in the United 
States an estimated 1,494,000 deaths from all causes, ^'he July 1, 1952, 
population, resident in the United Stales, was estimated to be 155,- 
707,000. 'The death rate for 1952 was therefore 

1,491,000 ~ 155.707,000 -- 0.0090. or 9.0 per thousand. 

It will be that the aecnracy iX a death rale d(‘pends first iij)on the 
degree of completeiavss of the registrations of death and socotkI upon the 
ac(’uracy of the mid-ycar population estimate used as tlie base. Since 
population counts are made only once in 10 years, most^ of the population 
figures umju nuist be estimates. When the population is e.stimated for a 
year falling between two censuses, the estimate is termed ki\ inter-cenml 
estimate; wdien the estimate is for a year after a census, it is termed 
a pod-cenml <.‘stimate. Iiiter-cejisal estimatevS are naturally somewhat 
more accurate thai] posi-censal eslimat^es. For the years 1951 to 1959 
inclifvsivT, death rates niu.sl at })resent be based upon post-censal esti- 
mate's and are called preliminary rates. After the 1900 census results 
are available, inter-censal estimates may be made for the years 1951 - 
1959, and the death rates may l>c r<a‘omputc(l upon the basis of these 
Jicw population estimates. rfu(*h rate^ an^ called 'rised rates. 

When the deaths occurring in a state or city are d^v -aed by the popula- 
tion of that ci^mmunity, the resulting crude death rate is subject to certain 
corrections. For example, in any given year people may die in a eommu- 
nity who are residtaits elsewiiere, and also some residents of any large 
community may die outside of that community. If the non resident 
deaths are deducted from those which occurred in the community, the 
resiilting rate is referred to as a local rate. If, in addition, the tleaths of 
residents occairring outside of that (‘ommuiiity are added, the resulting 
rate is reierred to as a resident rate. Failure to r«>e^5gnize these important, 
differences may lead to drawing false conclusions. In February 1935 it 
was announced that the death rale IVn lueens borough oi New York City 
w^as 0.5 per 1,000, for Bronx 7.8, for Brooklyn 9.3, for RicJimoncl 13.5, and 
for Manhattan 16.3. The death rate for Queens was lower than for any 
other such community in the rnilt‘d Stales, and at least one newspaper 
promptly announced that Queens was **the healthiest, place in the coun- 
try.’^ It was very quickly pointed out, however, that Queens possessed 
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a very Ipw quota of hospitals and that, therefore, some residents of Queens 
in need of hospital care would seek it in Manhattan or elsewhere. Hospi- 
tal cases naturally show a veiy high death rate, and a crude death rate 
would not reflect the fact that some persons dying in Manhattan and else- 
where were really residents of Queens. 

Death rates for particular classes of the population (malt\s and females, 
various age groups, and other categories) and for particular diseases or 
causes are referred to as specific death rates. Because the cleaths from 
any one cause are relatively few, cause-specific rates are usually stated per 
100,000 of the population. Thus in 1951 the death rate for rheumatic 
fever was 1.1 per 100,000. 

An intelligent comparison of the death rates of different communities 
involves consideration of the fact that the proportions of the sexes may 
differ and also that there may be differences in the age distributions, in 
the racial and nativity composition of the inhabitants, in occupations, 
and in other factors. A discussion of these differeru^es and the methods 
of computing adjusted and standardized death rates is too specialized a 
topic to be treated in this 

Birth rates. Birth rates arc usually calculated by dividing the births 
during a year by the mid-year population for that year. Just as in the 
case of death rates, we ma^' havt^ preliminary rates and revised rates We 
may also have gross, local, and resident rates. Stillbirths are not counted 
as births, although they have been so counted in the past; this fa(^t sliould 
be remembered in making chronological comparisons. Perhaps it is also 
worth while calling attention to the fact that the registration of births is 
not so complete a.^ is the registration of deaths. A death must be regis- 
tered before a burial permit may be issued and before interment may be 
made. A newborn infant, however, may be absorbed into the family 
and the community whether or not his birth is registered. 

The calculation of birth rates in relation to the total population is not 
thoroughly satisfactory, since the proportion of ‘‘(diild prodin ers'^ in the 
population is not constant either from time to time or fi\)m place to pli\ce. 
RetinementvS in the calculation of birth rates arc beyond sco[)e of this 
volume.' 

Crop yield.s per acre. Data of the total amount of a crop produced 
may tell us whether or not there is more of that commodity available in 

^ F. K. Linder and R. 1). Grove. V linl Statistics Rates in the V nittd States, lUOO- 
W 40 , Fedc?ral Security .\goncy, Putdie Health Service, National Otfice of Vital Sta- 
tistics, 1947 and Vital Statistics of the (United States, issued annually by the same 
office. 

’ The references given in footnote (» di.s<-us.s ])irth rates in more detail and also 
describe various other vital rat<‘S and ratios, such as morbidity rates, case-fatality 
ratios, marriage rates, divorce rates, fertilit\ rates, .stillbirth ratios, and others. 
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one year than in another. From such figures, however, we cannot know 
if an increase may have been due to a more abundant yield, to an increase 
in acreage, or to both. In 1951 there were 980,810,000 bushels of wheat 
harvested from 61,492,000 acres in the United States; in the following 
year, 70,585,000 acres yielded 1,291,447,000 bushels. Both the acreage 
harvested and the total yield had risen_, resulting in an increase in the 
yield per acre, which was 16.0 bushels in 1951 and 18.3 bushels in 1952. 
On a geographical basis, the United States, which produces more wheat 
than any other country for which figures are available, is not first in 
yield per acre. Canada, producing a little more than half as much wheat 
in 1952 as did the United States, had a yield per acre of 26.5 bushels; and 
Western Germany, which produced about one-tenth as much wheat as 
did the United States, showed 41.2 bushels per acre in 1952. 

Hog-corn ratio. The hog-com ratio is the result of dividing the 
average price per 100 pounds which farmers receive for hogs b}" the aver- 
age pri''*r' ner bushel which farmers receive for corn. For example, if, as 
on January 15, 1953, farmers are receiving $17.80 per 100 pounds for 
hogs and $1.48 per bushel for corn, the ratio is $17.80 $1.48 = 12.0. 

This ratio may be interpreted to mean that 100 pounds of hogs are 12.0 
times as valuable as a bushel of corn or, more simply, that 12.0 bushels 
of corn are equal in value to 100 pounds of hogs. On April 15, 1952 hogs 
brought $16.40 per 100 pounds and corn yielded the farmer $1.68 per 
bushel. At that time the ratio was 9.8. Over the 6-year period 1947- 
1952, the hog-corn ratio averaged about 13.2, falling as low as 9.2 in 
May 1948 and reaching 19.8 in February 1947. When the ratio is low, 
it is more profitable for farmers to sell their cori» ‘Utright than to feed 
the corn to hogs being fattened for market. When the ratio is high, it 
becomes more profitable for the farmer to feed corn to his hogs than to 
sell the corn outright. Since corn is the principal element of cost in pro- 
ducing hogs for market, the ratio is used as an indicator of the desirabil- 
ity of future expansion or contraction of hog production. There is thus 
a relationship between the hog-corn ratio and the hog production cycle. 
When the ratio is high, an increase in hog production (ends to follow. 
Such an increase is irequentU" followed by a decline in hog prices in rela- 
tion to corn prices, and there then follows a tendency to restrict hog pro- 
duction. Curves showing hog-corii ratios, by months, for 1948-1952 are 
shown in Charts 5.14 and 5.15. 

Batting averages. The familiar batting average of the sport pages of 
the daily paper is a ratio of the hits made by a batter in relation to the 
total number of times he was at bat. Table 7.4 shows a series of selected 
batting averages. The figures in the last column of Table 7.4 may be 
correctly thought ^4 as either ratios to one or as averages of a series of 
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observations each having a value of 1 or 0 (that is, either the l^atter did 
or did not make a hit). If a man has been at bat 75 times and has made 
25 hits, his batting average would be shown as .333 and is spoken of as 
‘^three hundred and thirty-three.*' If he had made a hit evuy time ho 
was at bat, his figure would be 1.000, which is referred to as thou- 
sand." Notice that certain contradictions are involved in some of the 
terms used to refer to these data. The column of figures is frequently 
headed ‘‘percentage”; the figures arc printed us ratios to one; th.o figures 
are spoken of as per thiutsand! 


TAHLK 7.4 

Individual Batting Averagejt oj 10 OutstanrUng Antrrivan League Players^ 

1952 


Player and club 

Gannas : 

Tjnie.s 
at bat 

Hits 

Baiting 

average* 

Fain, Ferris R,. Philadeiphia 

Mf. 

5:^8 

170 

,827“ 

Mitehell, L. Dale, Cleveland 

]M 

511 

105 

828 

Mantle, Mickey C , New York , 

142 

540 

171 

811 

Kell, George C., Detroit-Hoafon 

111 

42S 

188 

811 

Woodling, Eugene R., New York 

122 

408 

120 

809 

Goodman, William D., Boston . 

1/^S 

5i:^. 

157 

800 

Rosen, Albert L., Cleveland 

14S 

5t)7 

171 

802 

Avila, Roberto, Cleveland 

150 

597 

179 

.800 

Fox, J. Nelson, CMc/igo 

152 

048 

192 

. 290 

Robinson, W. Edward, Cliicago 

15.5 

591 

170 

290 

Di Maggio, Dominic P., Boston 

128 

180 

148 

294 

Bauer, Henry A., New York . 

111 

555 

102 

.298 

Xieman, Robert C., St, Louu^ 

idl 

178 

188 

289 

Courtney, Clinton. D,, St, Ix)uis 

no 

412 

US 

280 

Runnels, Jarne^ E,, Washington 

J52 

555 

158 

. 285 

Groth. John 'F,. Detroit 

141 

521 

119 

.284 


• Thia column xa headed ' I’CT ’ in the criKinal tahlo 

Data from American of Professi )nftl Baseball Chib.s, pn^ss for Decrmber l*t, 1952 


ilirline accident ratios. The safety of air travel may be indicated by 
means of ratios. The number of plane-miles flown during a year may be 
divided by the number of accidents to obtain ‘‘ plane- injles tiovvn per 
accident.” In 1952 scheduled domestic air-carrier operators flew 44/,- 
158,490 plane-miles and 36 accidents occurred. The lines therefore flew 
12,421,069 plane-miles per accident. In the same year, there were 5 acci- 
dents involving a fatality, and dividing the plane mileage flown by 5 gives 
89,431,698 plane-miles per fatal accident. During 1952 tliere were 46 
passenger fatalities as a result of airplane accidents on scheduled domes- 
tic airlines, and it appears that these lines flew 9,720,837 plane-miles per 
passenger fatality. Pas.senger fatalities may be related to passenger- 
miles, and since scheduled domestic airlines flew 12,996,671,000 passen- 
ger-miles in 1952, we have 12,996,671,000 46 == 282,536,326 passen- 
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ger-miles flown per passenger fatality. Because of the small number of 
accidents and fatalities involved, these ratios fluctuate tremendously from 
year to year. For example, the passenger- miles flown per passenger fatal- 
ity were 80,910,867 in 1946; 31,725,186 in 1947; 75,249,940 in 1948; 
76,032,710 in 1949; 87,118,531 in 1950; 79,111,993 in 1951; and 282,- 
536,326 in 1952. It will be observed that, as air travel becomes safer, 
all of the ratios mentioned will grow larger. Ratios of the number of 
fatal accidents per million plane-miles and of the number of passenger 
fatalities per 100 million passenger-miles may also be computed. Such 
ratios would be nH'iprocals of those given and, as air travel becomes safer, 
would become smaller 

The 100 per cent slateineiit. When banks, iiivsurance companies, 
and other corporations present financial informs lion to the public, they 


TABLE 7.5 

Assets Provident Life lnsur€tnce Company , December 31, 195t 

and December 31, 1932 



Amount 

Per cent 
of total 


PJ51 

1952 

1951 

1952 

S. Oovoriiiiiciii Bonds 

$117, 78U . OOo!$102 ,768,000 

17.5 

14 7 

(’Jaiuulihn Gov(‘rnin(’ut and Bioviiu'jai 
Bonds . 

18,984,000 

0,401,000 

2 8 

1 3 

8tnto and Municijdil Boiid.s 

1 ,861,000 

3,661,0001 ,3 

5 

Public Bonds 

16*2,207,000 

169,425,000 

24 1 

24.3 

Railroad Bonds 

42,520,000 

37.031,0001 6 3 

5 3 

Indastriul Bonds 

.U3,000 

riu 168.000 

11.7 

17.0 

Preferred Mml Guaranteed 8toc,ks . 

10.337,000 

19 ‘/'9,000 

2 9 

2 9 

Oon\nion Stock.s 

M,7«,>,000 

14,e57,000 

2 0 

2.1 

Mortgage Loans . . . . 

151 ,076,000 

170.718.000 

22 4 

1 24 5 

Iteal Estate Held for Investment 

2,821 ,0(X) 

2,740,000 

.4 

.4 

lb)nie Ofliee Projuaty 

2,000,000 

1 ,900.000 

.3 

.3 

Otlier Rea] Rstate .... 

2,786,000 

2,093,000 

,4 

.3 

liOaus on PoHcie.s of the Conipanv .... 

23,230.000 

23,657,000 

3.5 

3 4 

Casli 

5.312,000 

5,296,000 

,8 

.8 

Otjier Assets 

11,073,000 

11,-59,000 

1 6 

l.G 

Total 

$073,339,000 

$698,080,000 

100 6 

1 100 6 


Data from Provid^Mit MvUufil Life Insurance Coinimny of Fhiladolphia, Eighty-ciffhth Annual lieport, 
lOfiS. p. fi. Several poreentuKes above were adjusted by the company to make the total of each column 
of perceiitagea equal 1 00.0. 


find it effective to supplement the dob. figures with percentages. Thus, 
a financial statement may show each asset as a percentage of all assets, 
and each liability as a percentage of all liabilities. The procedure is par- 
ticularly effective when the dollar figures are large. Table 7.5 shows the 
assets of the Provident Mutual Life Insurance Company as set forth in 
an annual report. The actual figures, even though rounded, are too 
large for the ordinaiy reader to grasp and compare, but the percentage 
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data make comparisons less difficult. In preparing such a percentage 
statement, it is desirable not to show too many decimal places, else com- 
parisons cannot readily be made. A statement of the resources of a 
bank carried all percentages to three decimal places. This was quite 
unnecessary, particularly since the smallest item, ^^sundry securities,’^ 
was 0.035 (0,03 10) per cent and could have been shown as 0.03 per cent, 
and since the second smallest item, other assets,” was 0.039 per cent 
and could have been shown as 0.04 per cent. For popular presentation, 
there is some advantage in lumping such small items together in order 
to center attention upon the more important ones. These two small 
items, if combined, would h.ave appeared as 0.07 per (‘ent, or as 0.1 per- 
cent with all percentages^ shown to but one decimal place. However, it 
may have been de.sired to ernpha.size the smallness of either ^^sund^y 
securities’* or other assets,” or both. 

Railroad ratios. The efficient operation of railroads necessitates the 
collection and use of a vast amount of statistical data in connection with 
which numerous ratios are calculated. 'Fhe figures which follow are for 
Class f railroads for 1952. 

The investment per mile of lino is obtained by dividing total invevst- 
ment in road and equipment (iiu^luding (*ash, materials, and supplies) by 
the number of miles of railroad line. This figure was $1 19,820 per mile, 
or, allowing for accrued depreciation, $118,072 per mile. 

Freight revenue per ton-mile is obtained by dividing total freight reve- 
nue by the total number of ton-miles of freight hauled. The freight 
revenue per ton-mile was 1.430 cents. Similarly, we may compute the 
pa.sscnger revenue per pas.scnger mile, which amounted to 2.063 cents. 

The operating ratio is the ratio of operating expenses to operating 
revenues. Operating expenses were §8,053,003,585, while operating 
revenues were §10,581,418,1 15. The operating ratio was 76.1 1 per cent. 

There are a number of other railroad ratios; the meaning of each is 
rather obvious. lOnumerating a few: the gross revenue per ton of freight 
was §0.75; the haul per ton of freight was 427 miles; the revenue per pa.s- 
senger was §1.93 ; the average trip per passenger was 72.5 miles; the rate 
of return on aggregate property investment was 4.11 per cent; the hours 
worked fluring the year p(‘r railroad employee were 2,320; the percentage 
of unserviceable^ freiglit cars averaged 4.9 during the year; the ton-miles 
per day per freight car were 973; the mileage per day per freight car was 
40.2 miles. ^ 

The railroad ratios mentioned above are one type of business ratios. 


® For the.se and other railroad ratios, sec A Yearbook of Railroad Information, issued 
annually by the Committee on Public Rrlations of the Eastern Railroads, New York. 
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Many sorts of business organizations compute diverse ratios for the better 
functioning of the enterprise. Discussed in another volume^ are such 
ratios as current ratio (current assets current liabilities), merchandise 
turnover (net sales -- merchandise inventory), margin of profit (profit 4- 
sales), and labor turnover (replacements ^ number on payroll). 

FAULTY USE OF PERCENTAGES 

Ratios and percentages are in such general use that it is not surprising 
to find them occasionally misused. Difficulties enc.ountcred in the calcu- 
lation and use of percentages can generally i)e traced to one of the follow- 
ing causes: (1) confusion in regard to the base. (2j calculation of percent- 
ages based on small absolute numbers, (3) misplaced decimal points, (4) 
arithmetic mistakes, (5) improper procedure in avx'raging percentages. 
These will be tliscussed in order. 

Confusion in regard to base. Over a period of five years, from 1910 
to l^V^\ the enrollment in veterinary colleges in the United States 
deelincd from 3,160 to Oil students. The decrease was 2,519 students, 
or 79.7 per cent of the 1916 enrollment, yet the dean of a midwestern 
veterinary college was c|uoted as having said that from 1910 to 1921 the 
(mrollmeiit had decreiis^nl 500 per cent! The dean may have actually 
said, that the 1910 registration figure was about 500 per cent of the 1921 
figure. A de(‘r(‘avse of 500 per cent would mean a negative enrollment 
four times the size of the 1916 registration. 

In the autumn of 1920 a determined effort was made by the United 
States district attorney to have restaurants it: Pittsburgh lower their 
prices to a pre-war level. Newspapers aiinouhvjng the success of the 
drive stated that Pittsburgh restaurants had cut tueir prices 50 to 100 per 
cent. It is, of course, clear that pnees r*arinot be cut 100 per cent, else 
the servings formerly sold would he given away! The price reductions 
on a number of dishes w^ero stated; the greatest reduction took ])lace in 
the price of doughnuts and pie. These had formerly sold at 15 cents per 
%order. Identical-size servings were sold at 5 cents after the reduction: 
hence, the rc^iuction amounted to 00.7 per cent of the former selling ])rice. 

It is not at all unusual to see an advertis('ment claiming ‘‘pri(‘es reduced 
100 per ceiit.^^ Of course, this should mean that goods are being given 
away. One company even w'ciit so far as to ad vis: that their catalog 
wmuld enable one to ‘^save from 60 200 pei* cent.’' 

The most serious confusion in regard to a base seems to be present in 
a mail-order house guarantee of tires. The concern claims that the guar- 

® See F. 11 Croxtuii and U. J. Cowdeii, Puutical Businei^s Statisiirs, yecond edition, 
Prentice- Hal I, Inc.. New York, 1949, pp. 05-73. 
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antee is ‘‘without limit as to mileage, months or years of service/' and 
that tires will be repaired free or replaced at a charge “only for the 
actual amount of mileage 3^011 have received." Literally, the base is 
infinity, and, if the guarantee were to be fully carried out for all tire 
buyers, the company would quickly have to cease selling tires. In fair- 
ness to the concern involved, it should be noted that their adjustment 
policy is a generous one. 

Percentages from small numbers. An almost classic illustration of 
the undesirability of using percentages based upon small numbers is 
given by Chaddock.’^ 

A short time after Johns Hopkins University had opened certain 
courses in the University to women, it was reported that 38^ per cent 
of the women students had married into the facnlty of the institution. 

Of course the importanl information was the number of women sturlents. 
There were only three. When dealing with a small number of rases, the 
use of 'percentages alone leads to wrong iynpresswis. In these cases 
either percentages should not be used at all or the numbers upon which 
they are based should accompany the percentages. 

Ordinarily, percentages should not be computed unless the base con- 
sists of 100 or more cases. 

Misplaced decimal points. Mistakes involving misplatUHl decimal 
points may lead to gross misinterpretations. They are a (‘ominon sort of 
mistake and should be guarded against. Sir Josiah Stamps ^ gives a rather 
unusual illustration: 

A periodical^ return of revenue reeeived into the Exchequer was laid 
before Lord Randolph, and his private secretary, Mr. (leorge Gieadowe 
of the Treasury, was looking over his shoulder, and Lord Randolph 
exprcs.sed .satisfaction at the fact that the Customs revenue had increased 
by 34 per cent, as compared with the corresponding period in the pre- 
ceding year, Mr. Cieadowe pointed out to liim that it was only .34 per 
cent. ‘AVliat difference does that make?” asked I/jrrl Randolph, 
When it was explained to him he said, ‘‘J have often seen those damned 
little dots before, but I never knew until now what they meant,” 

Misplaced decimal places involve mistakes of such a rudimentary 
nature that the reader may feel they are too elementary to be mentioned 
here. However, a research report from a state university stated that 
during a year the military forces of the United States had consumed 
8.7 per cent of the coffee available during that year. The figures from 


Robert 10. Chaddock, Principles and Methods of Siatistirs, Houghton Mifflin Co., 
Boston, 1925, pp. 13-14. 

” Sir Josiah Stamp, Some Economic Factors in Modern Life, p. 205. P. S. King 
and &>n, liondon 1920 
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which the percentage was computed were 24 and 2,756 millions of pounds. 
The correct figure is 0.87 of one per cent. 

A feature writer for a metropolitan newspaper, discussing the Navaho 
Indians, said, ‘'The known Navaho death rate is 360 per 100,000.'’ 
Stated in the usual fashion, this would be 3.6 per 1,000 or, roughly, one- 
third of the rate for the United States, which was 10.6 during the same 
year. Although the basic data from which the Navaho death rate was 
computed were of dubious value, it is known that the figure is mucli 
larger than that for tlie entire country. The feature writer not only mis- 
placed a decimal (he had intended to say 3,600 per 100,000, which is 36 
per 1,000), but may have made an arithmetic mistake as well. 

it is of interest to note that a misplaced decimal always involves a 
serious misstatement, since tlie least mistake that can occur results in 
the incorrect figure being 10 times as large as it should be or one-tenth as 
large as it should bo. 

‘ 4 seem most likely to misplace decimals (1) when large 
absolute nuinbers an‘ involved or (2) when one of the absolute numbers 
is very large tor .small) in relation to the other, resulting in a very large 
(or small) ratio 'fwo illustratioiLs wall suffice. 

Over a period of years, the resources of a bank grew from $100,000 to 
S300,{X)0,000. A newspaper stated that the growth was 3,000 per cent. 
VcUially, the sec.ond figure is ,i/)00 times the first figure, or 300,000 per 
cent of it, and the growth was 299,900 per cent. 

An advertisement pointed out that more than 200,000,000 checks a day 
are paid in the Unitc'd States, and that about 99 9995 per cent of them 
are good. Said the advertisement, “Only 1 out of 2,000 is dishonored." 
The percentage and the ratio are in disagreement. Correspondence 
revealed that about 1,000 cliecLs per day we^e bad, so that the ratio 
should have been “ I out of 200,000." 

Arithmetic mlslak«?s. Early in 1953 a prorninent government 
official stated, according to newspapers, that Russian Communists 
dlmiinated 800,000,000 persons, and compared this figure with the United 
State.s population of about 150,000.000. The ratio, he is alleged to have 
said, was 7 to 1. The cornjct ratio is 5.33 to 1. 

Improper averaging of percentages. The occasional necessity for 
averaging percentages calls for mention of a pitfall and for consideration 
of the proper procedure. Consider the figu res of Table 3.5 : It is desired to 
know the average proportion of White persons who ^vere foreign born for 
the New England division. If we add the six percentages and divide by 
six, we have 72,1 -i- 6 - 12.0 per cent. This figure, how^ever, does not 
correctly represent the situation; the six percentages were calculated from 
differential bases and therefore should be weighted accordingly. The 
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easiest procedure for obtaining the correct percentage consists of totaling 
the White population for the six states (9,161,156 persons), totaling the 
foreign-born White population (1,286,051 persons), and dividing the 
second figure by the first. The result is 14.0 per cent, which is the pro- 
portion of foreign-born White persons in the New England division. The 
same result could also be obtaijied by averaging the six percentage figures, 
provided each is weighted according to the base from which it has been 
calculated. This procedure of multiplying each percentage by its base, 
summing the resulns, and dividing by the sum of the base figures (or 
weights) is essentially the same as the method just used. The result, 
however, is a little less accurate, since each percentage figure has been 
rounded. The error involved in rounding a given percentage is magnified 
when the percentage is multiplied. But since some percentages are 
understated and some are overstated, there is a tendency for these errors 
to counterbalance. Under certain conditions, it may be appropriate 
to average percentages wilhonf. weighting them according to tlieir bases. 
This is discussed on pages 183 184. 
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One method ol' organizing and summarizing ‘■’tatistical data consists in 
the formation of a fre<ju(‘n<*y distribution. In this device the various 
items of a series are (‘lassified into groups and the miinhor of items falling 
into ('acu grouj) is stated. A fretiuencv distribution is shown in Table 
8 3. Sometimes the user of statistics will find frequency distributions 
already constructed in the publications to which he may refer; sometimes 
he will construct liis own frequency distribution from unclassiticd data. 
We sliall l)egin our discus.^iou of thi^ fiequency distribution by first con- 
sidering the appearance of the raw or unelassified data. 

RAW DATA 

ddie unclassilied data from which a frc(|uency distribution might be 
made may appeal as ilo the data of Table 8.1. Here we have the grades 
received for the four-year course by the 225 cadei ■ midshipmen of the 1952 
graduating class of the United States Merchant Alarinc Academy. The 
arrangement of the grades is according to the alphalietieal order ^ of the 
cadet-midshipmen’s names, though we have omitted the names in order 
to save space. Another illustration of raw data, from which a frequency 
distribution might be constructed, is the pa^TolJ of a factory. 'Vho 
employees on the payroll may be listed alphabetically by name; by 
employee number; by departments, and then by name or number; by 
seniority; or in some otiicr convenient order. Considering the grades of 
the cadet-midshipmen as shown in Table 8.1, it is apparent that very 
little information is fortlieoming i; less the figures arc rearranged. When 
the data are listed as in Table 8.1, it is a tedious task to find even the 
lowest grade and the highest grade. It is even more difficult to ascertain 
around what value the grades tend to concentrate, or if, indeed, they do 

^ A slight rearrangement was made in Table, 8.1 so that identification of the grade 
of any particular < adct-niidshipman in impossible. 

15:5 
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«ho\v such a concentration. 'J'heso and other steps in analysis are facili- 
tated by rearranging and summarizing the data. 

T Wil.V 8.1 


Grades licceiivd for ilie.. Four- Year Cotirse hy 225 Cadet- ^fidshtpmen of the 
J952 Cradiiatiof: vf the I tiitffl States Mvreluinl Marine Aead^iniy 
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Data from States Merchant Manue \<'a<l*-njy. For the pnrpoaefc of our illustration, t 

gravle.s, oriigmalli given to two flee iiuuLs, were roMnded to one. dcritnal. 


THE ARRAY 

In Table S.2, the cadet -midshipmen grades have been rearranged in 
descending order. Such an arrangement (whether ascending or descend- 
ing) is called an array. It arranges the items in order of magnitude. We 
have not summarizt'd; that will be done when we construct the frequency 
distribution. A consideration of the carray puts us in a position to learn 
something from the data. First, the array ciiables us to see at otice the 
range of the grades, whieli varied from 72.1 to 89.6. Second, it m&y also 
be observed that there i.s a concentration of grades between 78 and 
80. This will be more clearly seen when we examine the frequency dis- 
tribution and consider measures of <*enlral tendency. Third, a somewhat 
more extended examination gives us a rough idea of the distribution of 
the grades. We may observe, for example, that there are few grades 
below 74 or above 87. This particular feature of the series will be much 
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more readily studied when we have the frequency distribution. Fourth, 
it may be noticed that the fij^ures show a fair degree of continuity. If 
the grades arc expressed as whole percentages, all consecutive values from 
72 to 90 are represented. If we consider the figures as shown, to ona 
decimal place, vv(‘ may observe that within the r^mge of 7o.O to 85.0 
inclusivcy which includes 189 of the 225 cadet-rj -idshipmen, 86 of the 

• TAIU.K 8.2 

.irrny of Rfceivetl for the foiir->V*«r Cotirst^ by 22S (^adet-Midfihiprnen 
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possible 101 values are to be found. Tf the grades had liceii for a largci 
n\iniber of students, thirt t<;iu]eiicy would have ])eea more marked. 

The array, however, is a cumbersome form of the <lata. Furthermore, 
it is troublesome to construct, because of tlie i ocessity of rearranging all 
the items. One fairly satisfactory method of construe! -ng an array con- 
sists of recording the figures on sm;, ' cards and sorting the cards. Of 
course, if the data are punclied on meehanical tabulating cards, the cori- 
Btruetion of an array is Minple. 

When studying grades, waj may frequently want to make an array. 
Some institutions publish each 3 mar a roll of the graduating class, listing 
the names and standings of the students in order from highest to lowest. 
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If we are interested in a campaign to raise funds for a hospital or com- 
munity chest, it might be very useful (for publicity purposes, for example) 
to list the individual gifts in descemling order. It is obvious, however, 
that such a listing of oOO or 1 ,000 contributions would be cumbersome and 
of limited value. In ma]iy instances there is no particular advantage in 
making an array. It woulil be a waste of time for a concern to make an 
array of the amounts paid to iis employees eacli montli. There is not 
much reason why a bank should make an array of the daily balances of 
its many depositor.^. On Iheotlier hand, a student of vitalstatistics might 
find it very valuable in a study of birth rates to array the variou.s cities in 
ascending or descending order and consider the reasons for the differences. 

THE FREQUENCY DISTRIBUTION 

The array of Table S.2 rearrangt'd th(‘ midshipmen’s grades. Idie fre- 
quency distribution of Table' S.3 summarizes the grades into 9 groups or 

TAULi: B.3 

Fretfut'ticy Distribution of (>ra€i(*s Hevi'ir»*fi 
for thr Four^Yenr (',oursv by 22^^ (Uul**t~ 

Midshipmen of the 19S2 (^raduat- 
ilia Class of the Cnited States 
Merchant Marine Arademy 
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classes. It i.s obvious that the frequency distribution doe^s not. show the 
details given in the array, but much is gained by the summarization. We 
can see that the lowe.st grade is not below 72 and that the highest grade is 
not quite 90; we cannot ascertain the exact values of the highest and low- 
est grades as we did from the array. The concentration of grailes in the 
neighborhood of 78-80 is apparent at a glance. If we draw a curve of 
the frequency distribution, as in Chart 8.1, we can visualize the data 
readily and we may make cornpari.sons with other series, as discussed in 
a later section of this chapter. Having classified the data, we are in a 
position to make rapid computations of certain values (discussed in the 
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following chapters) which will assist us in describing and analyzing the 
data. 

When an array is available, the frequency distribution may be made by 
merely counting the items. It is not advi.sable, however, to make an 
array solely for the purpose of making the frequency distribution, because 
too great an amount of time is required to construct the array. 

If the dat^ are in unorganized form, as iji Table 8.1, we may construct 
a frequency distribution by a scoring device similar to that shown in 


NUMBER OF 
CADET -MIDSHIPMEN 



Chart 8.1. Grades Keeei\ed for the Cour-’^’ear ('onise by 225 Cadet- 
Midsliipiiieii of the 1952 Graduating Class of the Uiiiled Slates Mfirchaiit 
]\1 urine Aeaileiny, Data of 


Chapter 2. Another method of luindliiig the figures consists of making an 
entry form such as that of Tabic 8.^. This is !( js laborious than making 
an array and hao certain advantages over the scoring procedure. The 
advantages of the entry form are: (1) we t';tn scan the columns to see if 
any item is incorrectly cntei*ed; (2) we can total the items entered and 
check this total against the total c. the unclassified data; (3) if we should 
decide that we want class(Ns of I per (*eut or 3 per cent instead of 2 per 
cent, we can re-form our frei|uency distribution with little effort; (4) as 
will be shown in the next chapter, the entry form enables us to find out 
how closely the mid-value of a cla.ss agrees with the average of the items 
in that class. If desired, the classes used in the (intry form may be 
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TABLE 8.4 


Entry Form for Grades Rcceivcrl for the Four^Year Course by 225 Cadet- 
Midshipmen of the 1952 Graduating Class of the United States Merchant 

Marine Acaden^y 
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narrower than we think we .sh.ili want for (he frequency distribution. 
These classes may then be reatlily corabined irito wider ones, using what- 
ever interval and whatever class limits seem advisable. 

All the class intervals of the frequeray distribution of liable 8.3 are 2 
per cent. Charting and computations are facilitated when the class 
intervals are all the same. Whenever possible, therefore, frequency 
distributions should he constructed with uniform class intervals. This, 
however, is*not always pracfi^-able. 'Fable 8.5 shows a frecjuency dis- 

'I ABLI*: 8.5 

Avt'rajue St rni^ht-Timi* Wi^ekly Enrninfis of Female S{'cretaries 

in i\oTt-Mutinfacinrinif Industries in i\eiv York Cify, January 

1952 
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Data from United States Bureau of f^nbor Slafistios. ipi>'iortal tCai/t Survfy, Neu Yvrk, 

Nevi York, January 1952, page 10. 

tribution which has non-uniform lass intervals. In this instance the 
result is to give more detailed information for the secretaries having lower 
earnings. 

Selecting the number of classes. No hard-and-fast rule can be 
given as to the number of classes into which a frequency distribution 
should be divided. If there are too many classes, many of them will 
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contain only a few frequencies and the distribution may show irregu- 
larities which are not attributable to the behav^ior of the variable being 
measured. If there are too few classes, so many frequencies will be 
crowded into a class as to cause much information to be lost. The 
number of classes to use depends partly upon the nature of the data (as 
will be noted for meal checks in the next sc(‘tion), and partly upon the 
number of frequencies in the series. The greater the num})er of fre- 
quencies, the more classes we may have. The regularity with which the 
frequencies are distributed within the range of values under consideration 
is also a determining factor. The more regular the distribution of the 
freciueticies, the more classes we may use, sin(‘e data having a high degree 
of regularity may be divided into a large number of classes without show^ 
ing unwarranted gaps and irregularities in the froiiinnicies. In general, it 
might be said that fewer than 6 or 8 classes should rarely be used, and 
that more than 16 classes would be useful only for working with o.xlensive 
data. For illustrative purposes, 9 classes wore used in Table 8.8. When 
the number of classes has been determined,'^ the range of values for the 
entire distribution indicates the class interval to be used. 

Selecting class limits. It was pointed out in (-hapter 4 that the 
mid-value of each class is used to represent the class. The mid-values (4 
the classes arc made use of not only when charting the frequency dis- 
tribution, but also in making various computations to be discussed in 
later chapters. If the limits of each class are not clearly indi<‘ated, the 
mid- value, which is the average of the upper and lower limits, cannot be 
properly deteniiined. The adequacy of the mid-value assum])tion will 
be discussed more fully in C'hapter 9. It is impoitant at this point to 
make clear that, when a frecpiency distribution is being constructed, the 
class limits should be so chosen that the mid- value of each class will coin- 
cide, so far as possilile, w ith any values around which tlie data tend to be 
concentrated. 

Suppose that measurements are made of the academic standing of a 
large group of college freshmen upon a numerical scale ranging from 0 to 
100. The data could be experted to be gnidualfnl fairly smoothly from. 


’ SnrdFcor lias that tho riass intiTvaJ For a froriurnry distrihutian slioiild 

be not larsur than onotoiirlh of tlic e.'^tiinated population standard deviation (see 
(.’hapter 21) of the data. See G. \V, Snedeeor, Stahslirnl Mt'thods, tth ed., Collegiate 
l^n*s.s, .\rnes, Iowa, 1010, p. 170. 

For onr figures the estimated population standard di'viation, eoiuputed from the 
raw data, is 3.07. Following Snedecor^s rule, the ela.ss intervals should he 0.9 or less 
in width, so that wo would have 20 or more (dasaes. Note that this rule requires the 
time-consuming computation of the estimated population standard di'viation from 
the ungrouped data, and also that it fails to take into consideration the number of 
items involved. 
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say, 50 to nearly lOO. There would be students rating 88.0 and others 
89.0; in addition, there would be still others falling between these two 
values. If a large enough group were to be measured, the minuteness of 
the variations between 88.0 and 89.0 would be limited only by the accuracy 
of the measuring instrument (in this case, the grading system). There 
would not be a series of values around which the frequencies would tend to 
concentrate, and the problem mentioned at the end of the preceding 
paragraph would not arise. 

On the other hand, consider the meal checks of a cafeteria, many (but 
not all) of which are a multiple of 5 cents.* In this instance, the class 
intervals should be written 8 T2 cents, 13-17 cents, 18-22 cents, and so 
forth, thus giving mid-values of 10 cents, 15 cents, 20 cents, and so on, 
which coincide with the concentration points. 

The data of freshmen grades and the ratings of midshipmen are illustra- 
tions of what is termed a continuous variable, since the values are capable 
of infini*/''W small variations from each other. Heights and weights of 
people are also continuous variables. Length of life is another illu.stra- 
tion. The data of cafeteria meal checks are illustrative of a discrete or 
discontinuous variable, since the values differ from each other by finite 
amounts — in this case, one cent. A discrete variable need not show the 
concentrations which were present in the meal-check data. P^or example, 
if many workmen are employed at similar tasks and are paid on a piece- 
rate basis (that is, upon the basis of amount produced), it is quite possible 
that there may be individuals receiving $01.21, $01.22, $01.23, and so 
forth, for a week’s work. Although piece rates might be, and often are, 
in fractions of a cent, the weekly payiacat must terms of whole cents. 

The foregoing suggests an important consideraf on; namely, that we 
are not so much concerned with the fact that a variable is discrete as 
we are with the fact that the data may be broken and that there are 
inherent gaps and concentrations in the actual data in hand. Such a 
situation often occurs when dealing with salaries. One organization 
wij/h several hundred employees paid .salaries ranging from about $1,200 
to more than $15,000 per year. There was in no sense au evenly gradu- 
ated distribution between these limits. The gaps between adjacent 
values ranged from $10 to $5,000, and there were pronounced concentra- 
tions at various customary salaries such as $2,500, $3,000, $3,(300, $4,000, 
$4,500, $5,000, and so on. The selec,*on of class limits for a di.stribution 
of this type presents great difficulty. Often it is not possible to adjust 
the mid-values to coincide with all concentration points. An approxi- 
mate adjustment must then suffice. 

The fact that we may be dealing wdth a continuous variable does not 
warrant us in sole ting class limits blindly. If data are being collected 
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concerning weights of individuals, reported to the nearest pound, persons 
reported as weighing 142 pounds would vary between 14 Lo pounds and 
142.5 pounds; as a group, they would average about 142 pounds. Sup- 
pose, however, that weight is reported to the lant full pound. In that 
event, persons reported as weighing 142 pounds would vary between 
exactly 142 pounds and just under 143 pounds; as a group, they would 
average about 142.5 pounds. Let us assume that a frequcyicy distribu- 
tion with class interval of 3 pounds is to be formed. If weights have 
been reported to the nearest pound, it is correct to write class intervals 
^‘142 -144, 145 147, 148 150,^' and so on, with mid-values of 143, 140, 
149, and so forth. If, liowcver, w'cights have been reported to the last 
full pound, the above is incorrect, hut it is correct to write 142 and unch^r 
145, 145 and under 148, 148 and iiniier 151,” and so on, with mid-valu»*s 
of 143.5, 146.5, 149.5, and so forth. 

Sometimes, when dealing with a continuous variable, the classes are 
written so that the limits appear to overlap. For example, the data of 
cadet-midshipmen’s grades could have been classified 72.0 74.0, 74.0 
7G.0, 76.0 -78.0, and so on. When this is done, frecjiiencieis which fall on 
a class limit arc divided between the two classes, usually n.'sulting in some 
fractional frequencies in the distribution.'* A frequency distribution 
using these clas.ses may be easily constructed from the array of Table 8.2 
or the entry form of liable 8.4. Overlapping class limits are not often 
used for data of grades. 

Curvo.s olTrcqueiicy distrihiilions. The graphic representation of a 
frequency distribution was discu.ssed in (-h.apter 4. Although a fre- 
quency distribution may be represejited by either a column diagram or a 
curve, it is usual to employ the latter device. (We shall make use of the 
column diagram in Chart 8.5 and in Chapter 23.) One adva?)tage 6f the 
curve is that two or more ciir\x's may readily be drawn on the same axes 
for purposes of comparison. In any event, the first step in the analysis 
of a fre(|uericy distribution should be the coTJstruction of a chart, for it 
will tell us at a glance with which of the following types of distributions 
we arc dealing. 

Chart 8.1, showing the graphic appearance of the (l?tta of cadet- 
midshipmen’s grades, is not symmetrical, but is slightly skewed to the 
right. (Skewness is dis(*ussed in Chapter 10.) Many freqiu^ncy dis- 
tribution curves encountered in the social sciences are flsymmotri(*al and 
frecpiently are skewed to the right. Only rarely do we find a curve 
skewed to the left. 

^ For an example, see F. K. Croxton, ?]Iementnry Statistics with Applications in 
Medicine, IVentice-I [all. Tnc., New York, 1953, pp 41 42, 
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NUMBER Of 
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\ tiarl 8.2. Hcightn of 9,.S52 Male lii<lu«trial Workern. Dutu from 
il Health Study of Ten Thousand Male Industrial Workers, p. 50, United 
States Public Health Service, Public Health Bulletin No. 162, 

NUMBER OF 
INVENTORS 



Chart 8.3. Age at Death of 371 American Inventors. Data from '‘Bio-Social 
Characteristics of American Inventors,” by Sanford Winston, A^nerican Sociological 
Review^ Vol. 2, No. 6, pp. 837-849, 

Biological and anthropometrical series (especially those involving linear 
measurements, such as height, rather than two- or three-dimension 
measurements, such as waist circumference or weight) frequently yield 
curves which are roughly symmetrical. Such a series is shown in Chart 
8.2, which pictures the height distribution of a large group of male 
industrial workers. 
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A curve which is skewed to the left is shown in Chart 8.3, which 
depicts the age at death of 371 American inventors. As pointed out in 
Chapter 10, where the amount of skewness in this series is ascertained, 
the skewness may be characteristic of the variable or may be due to the 
fact that nearly one-hfth of the inventors included in the study were born 
before 1800. 

The curve of Chart 8.4 indicates the length of time during which cars 
were parked in Albuquerque, New Mexico, and shows a great many cars 


THOUSANDS 
OF VEHICLES 



Chart 8.4, Parking Time of Motor Vehicles in Albu- 
querque, New Mexico, The data are frewn the Automotive 
Safety Foundation niid are for Juno, July, and August 

parked for short periods and generally smaller numbers parked for longer 
lengths of time. Curves having this characteristic ‘^reverse J” shape 
may be encountered occasionally. 

Graphic representation when the class intervals are unequal. 

For some frequency distributions, it is not feasible to maintain the same 
class interval throughout. The distribution of 'Fable 8.5 has eleven 
classes of $2.50, six classes of $5.0C, three classes of $10.00, and one class 
of indeterminate width. It would not have been desirable to have used 
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12.50 class intervals throughout, since that would have necessitated 35 
classes to cover the range from $32.50 to $120.00. This would be too 
many classes to be u.seful and would provide a more detailed breakdown 
than needed for the upper ranges of the .series. Class intervals of $5.00 
throughout would not have been desirable either, since details concerning 
secretaries having earnings of les.s than $60.00 per w’eek would have been 
lost. , 

To draw a suitable chart of the data of Table 8.5, it is necessary to 
make adjustments for the varying cla.ss intervals. The class “$60.00 but 


NUMBtR or WOMtN 
PIR ^2.50 or EARNINGS 



DOLLARS 


Chart 8.5. Frequency Oensitieti of Average Strai^i\t-Time Weekly Earn- 
ings of 14,817 Female Secretaries in Non-Manufactu^- mg Industries in New 
York City, January 1952. Data from Table 8.5. 

less than $65.00” is twice as wide as the classes which precede it. We 
do not know how many of the 2,679 secretaries earned $60.00 but less 
than $62.50 a week and how many earned $62.50 but less than $65.00 a 
week. We can say, however, that on the average there were 1,339.5 
secretaries in each of the two halves of the class *'$60.00 but less than 
$65.00.'' Adjustments of this sort have been made in the last column 
of Table 8.5, where the frequencies are staled per $2.50 of earnings. 
These are frequency densities. 

The distribution of secretaries' eai aings may now be plotted in terms 
of the frequency densities, as in Chart 8.5. It is not possible to make an 
estimate of the width of the last class interval in Table 8.5, so no adjust- 
ment of the frequencies of that class has been made. Notice on the 
chart how the reader's attention was called to the presence of these 27 
secretaries. Alternatively, the data of frequency densities could have 
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been shown by a curve instead of a column diagram, and this was done in 
Chart 4.25. However, the column diagram makes it easier for the 
reader to note the changing class width. The irregularities of Chart 8.5 
do not indicate that too many classes were used. They are due to the 
nature of the basic data, there being concentrations on weekly salaries of 
$50 and $55. 

Graphic comparison of frequency distributions^ Table 8.0 
shows two frequency distributions, one giving the straight-time weekly 

NUMBER OF 
WOMEN 



'30 35 40 45 50 55 00 


OOLtARS 

Chart 8.6. Average Straight-Time Weekly Earning^ of 9i0 Female Book- 
keeping-Machine Operators, Class B, and of 457 Key-Punch Operators, in 
Finance, Insurance, and Real Estate OBices in Philadelphia, October 19.52. 
Data from Table 8.6. 

earnings of 940 class B bookkeeping-machine operators, the other present- 
ing the straight-time weekly earnings of 457 key-punch operators. Both 
series are for females only. If the two distributions dealt with approxi- 
mately the same number of women, we could merely plot two frequency 
curves on the same grid and study their outlines. The result of doing this 
for the two series of Table 8.6 is shown in Chart 8.6. The comparison is 
not particularly illuminating, although it is obvious that the most 
prevalent earnings are a little higher for key-punch operators than for 
bookkeepiiig-machine operators. If each frequency is expressed as a 
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percentage of the total of which it is a part, we obtain the percentage 
frc(iuency distributions, which are also given in Table 8.6. Plotting the 
two percentage frequency distributions, as in Chart 8.7, enables us to 
make a graphic comi)arison of tlie two series, w^hich is no longer com- 
plicated because of the dilfereni number of items. The relative impor- 
tance of all (>f the various chusscs may now readily be seen. 

The (‘omparison of the two series of Table 8.G was facilitated because 
ihe class intervals Avere the same. If hvo series, expressed in the same 
unirs but having diffeiont class intervals, are to be compared graphically, 

TABLE 8.6 

Sfraiif/t (’•Time Ifvekly Fiirnitii^s of 940 Female Bookkeepi rif^ • Machine 
Opcrnlors, CAnss B,* anti of 457 Kcy-Fnnch Operators in Finance , 
Insurance^ and Real Estate Offices m ila€ielphiaf October 

1952 


1 

f Number 

1 Her cent of tot.al 

\V<‘pl<ly f'ji 

Hookkeoping- ! 

1 Key- 

Br>okkceping- 

Key- 


maehine 

punch 

machine { 

punch 


operators 

operators 

operators 

operators 

;:J() 00 but Irss than $^2 60 

37 

15 

3 9 

3 3 

:C2 50 but loss than S5 00 

78 

41 

8 3 

9 0 

Sf) (U) but ii'ss than 1^7 50 

170 

55 

19 0 

12 0 

S7 50 but less than 10 00 

101 

7(5 

20.3 

16 6 

to 00 but l(is.s than 42 50 

181 

91 

19 0 

19.9 

12 .50 but less than 45.00 

85 

57 

9 0 

12 5 

15 00 but 1p.s.s than 47 50 

8.3 

45 

8.8 

9 8 

17 50 b\it than 50,00 

65 

43 

G 0 

9 4 

.50 00 but loss than 52 50 

32 

22 

3 4 

4 8 

52 50 Imt Ir.ss than 55 00 

r. 

I i 

0 5 

i 

55,00 but less tliaii 57 50 

1 ! 

1 

0 1 

0.2 

Total . 

! DIO i 

‘ 457 

_ 100 0 _ 

1 100 0 


* A Class B Operator ’‘keeps a ro«:orJ f (n»o or none phase.** oi f-eofion". of a 

sf't of reeora.s usually re<pnnnK soiiie knowledKe of ba^'ie hookke<‘iunK- Phases or seetious im lurio 
accounts pa>al*le, payroll, customers’ accountji (not uirluding snnple type of Lilling flc'-rnbr*! umlor 
"biller, machine'), co.st distribution, expense distribution, inventory cuntiol, etr. May choi k or assist 
in picparation of trial b.ilanccs and prepare control sheets for the ar counting departmenf." 

Data from U. S. Bureau of Labor Stall, sties, ICayes and Salaries m Philadelphia, Pcnnsyli'anui, 
&clobtr Preliminary Release, Table A-1. 

Ave may plot frequency densities per unit (that is, per dollar, per pound, 
or whatever the unit may be). If the tAvo series also differ appreciably 
in regard to the number of items involved, the areas under the two I'urves 
may be made the same by computing percentage frequencies and ex- 
j)ressing the percentage frequencies as frcciuency densities. 

Occasionally avc wish the differences betAveen the numbers of items in 
two series to be apparent, as in Charts 24.1 24,4, and in sucli a situation 
we do not use percentage frequencies. Frecjiiency densities would, hoAV- 
ever, be used v lien needed, as in Charts 24.1, 21.3, and 24. lA. 
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When two frequency distributions are expressed in terms of different 
inits (dollars, pounds, inches, and so on), a direct graphic comparison is 
not feasible, since there is no simple way in which the Y-scales may be 
adjusted to each other. Certain computed values, to be discussed later, 
may be used to obtain effective numerical comparison. 


fER CENT 
OF WOMEN 



Chart 8.7. Percentage Distributions of Average Straight- Tifnc WeekK 
Karnings of 940 Female Bookkeeping-Machine Operatorn, B, ami of ir?? 

Key-Punch Operators, in Finance, Insurance, ami Heal Kslale OlHcc** in 
Philadelphia, October 19a2. Data from TaVjle S.h. 

Cumulative frequency distributions and the ogi\e. The data c'f 
Table 8.3 show the usual (non-cumulative) form of ilio freciueiioy dis- 
tribution and enable us to ascertain the number of cadet-midshipmen 
falling in each class. Sometimes, however, it may be useful to know how 
many or what proportion of students received less than certain stated 
grades, or to know how many or what proportion of students received 
specified grades or above. This information may be seen clearly in a 
cumulative table such as Table 8.7. In this table the frequencies of 
Table 8.3 have been accumulated upon a ‘Mess than'’ basis and also upon 
an “or more" basis. 

When cumulative frequency distributions are drawn, the frequencies 
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TABLE 8.7 


Cumulative Distributions of Grades of the 1952 Graduating 
Class of the United States Merchant Marine Academy 



Number of 
cadet-m idshipmen 
whose j^radcs 

Per cent of 
cadet-midshipmen 
whose grades 

Grade 

• 

Fell beiow 
the upper 
limit of 
each daaa 

Fjqualled 
or exceeded 
the lower 
limit of 
each class 

Fell below 
the upper 
limit of 
each class 

Equalled 
or exceeded 
the lower 
limit of 
€^ach class 

'>2 0 -‘73 9 

74 0--75 9 

76 0 77 9 

78 0 79 9 

80 0-Hl 9 

82 0 -83 9 ! 
84 0-85 9 i 
86 0-87.9 ! 
88 0 89 '9 i 

7 

38 

80 

134 

167 

191 

213 

1 221 

i 225 i 

'225 

218 

187 

145 

91 

58 

34 

12 

1 4 : 

3 1 
16.9 

35.6 

59.6 
''1.2 

84 9 

94 7 
98.2 

100 0 

100.0 

96.9 

83 1 

64 4 
40.4 

25 8 

15 1 

5 3 

1_8 __ 

NUMBER OF 

CADET -“MIDSHIPMEN 






72 74 76 76 SO S2 84 . 86 88 90 

grade 


Chart 8.8. Cumulative DisIributioiiB of Cradea of the 19.52 
Graduating Class of the United States Merchant Marine 
Academy. Data of Table 8.7. 




170 


THE fkequf:ncy distribution 


[Chap. « 


are plotted opposite the appropriate class limits, resulting in curves such 
as those shown in Chart 8.8. Such curves are called ogives. 

Cumulative frequency tables and ogives are often used to present data 
of wages and of hours of work. With reference to wages, they enable us 
to ascertain how many (or what proportion) of a group receive less than a 
.subsistence level, standard level, or comfort level. Similarly, we can 
ascertain the number or proportion receiving a subsistence level or more, 
a standard level or more, and a comfort level or more. It is also possible 
to ascertain what wage the lowest- (or highest-) paid 10, 25, 50, or other 
per cent of the workers are receiving. With respect to hours of work, we 
can see quickly the number or proportion working unusually long or short 
hours. 

If two cumulative frequency distributions are ba.sed upon nearly the 
same number of items, their ogives may be plotted and compared in 
absolute terms. If, however, the tw'o series are based upon different 
totals, the comparison must be based upon the percentage frecjuencies, 
just as in the case of comparing two frequency distribution.s in non- 
cumulative form, which was previo\isly discnissed. 



Symbols Vsetl in Chapter 9 


jSi! lower-case Greek beta, a measure of skewness. See ("hapter 10. 

1 ^ 2 : lower-case Greek l)eta, a measure of kurtosis. See Chapter 10. 
d: deviation of an X value from Xd- 

d': deviation, in terms of class intervals, of an X value from Xd. 

All upper-case Greek delta, the difference between the frequency of the 
modal class and the frequency of the class graphically to the left of the 
modal class. 

A2: upper-case Greek delta, the difference between the frequency of the 
mc'^nl class and the frequency of the class graphically to the right of the 
modal class. 

/: a frequency. 

/i,/2, /s, * * * : the frequencies associated with A^i, Z2, A'3, • » * 

(7: the geometric meaii. 

H: the harmonic mean. 

i: the class interval. 

h: the lower limit of a class, 

I2: the upper limit of a class. 

Med: the median. 

Mo: the mode, 

n: as used in the “compound interest formula,^' 'he number of years (or 
other time units) from the beginning to the end of the period. 

N: the number of items in a sample. 

Po and Pni as used in the “compound interest formula, respectively, the 
value at the beginning and at the end of the period. 

Jih Q2, Qs*. the quartiles. Q2 = Med, 

S: upper-case Greek sigma, meaning “take the sum of.^' 
r: as used in the “compound interest formula,” the ratio of increase or 
decrease per year (or other time unit). 
s: the standard deviation of a sample. See Chapter 10. 
x: the deviation of a value from A 

X2y Xi, • • • : deviations of Ai, A2, A3, * • • from Z. 

A": a value in a series; also, the mid- value of a class in a frequency dis- 
tribution. 

Ai, A''2, A3, * * • : the values in a series; also, the mid-values of the 
classes of a frequency distribution. 

171 



172 


SYMBOLS USED IN CHAPTER 9 


(Chap. 9 


Xi". a designated mean used aa a first approximation to facilitate the com- 
putation of of a frequency distribution. 
jP: the arithmetic mean. In later chapters, we shall distinguish between 
the arithmetic mean of a sample, X, and the arithmetic mean of the 
population, X(p. 

00 : infinity. 



CHAPTER 9 


Measures of Central Tendency 


We have seen how to construct a frequency distribution and how to 
draw a frequency curve. From either the classified data or the chart, 
it is obvious that there are certain values that are frequently present and 
otherc ^h«t occur less frequently. Most of the curves that we encounter 
are of the type that is very roughly ‘4)e 11 -shaped/* as shown in Charts 
8.1, 8.2, and 8.3. For such series as these charts represent, it is obvious 
that the more characteristic values a»*e in the central part of the distribu- 
tions. We therefore use the term meamres of central tendency to identify 
the values which may be computed in an attempt to characterize this 
aspect of a frequency distribution. We shall discuss in this chapter the 
arithmetic mean, the median, the mode, and, briefly, the geometric mean 
and the harmonic mean. 

In the following chapter we shall consider measures of dispersion, which 
refer to the spread of a distribution, measures ui .skewness, which measure 
the direction and amount of asymmetry; and mt.isures of kurtosis, which 
indicate the degree of ‘'poakedness'* of a series. 

THE ARITHMETIC MEAN 

The arithnielir mean from iingrouped data. The arithmetic 

mean is in such constant everyday use that nearly all of us are familiar 
*vith the concept, ^^ometimes \vc refer to the aiilhmetic mean merely as 
‘Hhe average** or ‘‘the moan,” but we always use the appropriate adjective 
when we are speaking of the geometric mean, the harmonic mean, or some 
other less usual mean. 

The arithmetic mean of a serw of items is obtained by adding the 
values of the items and dividing by the number of items. Suppose that, 
in a certain small city, carrots are selling for 8<f, lOff, I Iff, and 12ff a 
pound. The arithmetic mean of these four figures would be given by 

Si + 10<^ + llff + 12€f 41c 




MEASURES OF CENTRAL TENDENCY 


(Chap. 0 


174 

r 

/ If we let Xif Xiy Xzf etc., indicate the various values; the number of 
items; and X, the arithmetic mean, we have 

Y ^ + X2 + .Y3 + • • • + X.v 


Or, more briefly, using the summation symbol 2, we may say 



The foregoing computatio?! of the arithmetic mean involved no con- 
sideration of the fact that different riuantities of carrots may have been 
sold at the various prices. When an arithmetic mean is computed in this 
fashion, it may be referred to as a simple arithmetic mean. It is not cor- 
rect to refer to this mean as an unweighted arithmetic mean, since each 
of the prices was weighted equally. Let us proceed to compute a properly 
weighted arithmetic mean, considering the fact that there were sold 
10,000 pounds of carrots at 8c, 8,000 pounds at 10^, 4,000 poinuLs at I U, 
and 1,000 pounds at 12^. Wo ]iow have 


^ ' ~ " 23,000 


210, 000 

23j)00 


0.30c. 


If we use the symbols /,, /.j, /s, etc., to indic'ato tl\e numbers or frequen- 
‘cies a.ssof‘iated with caeh value being averaged, we ha\e 

Y _ /^A 1 + A’A^ 4* /jA'^s + • • • ^/A \fX 

VToT/Z-r-^''" ' 'Yf ^ 

Ordinarily an arithmetie mean is con.sidcred to be a weighted arithmetic 
moan, fis just descrllied, unless otherwhse specifit^d. 

It should be noted that, although the arithmetift mnau price of carrots 
is 9.39f! per pound, no carrots were actually sold at this e.xact price per 
pound. The arithmetie mean must therefore be thought of as a com- 
puted value and not as a value which actually exists. 

Properties of the arithmetic mean. One important property of the 
arithmetic mean is that the algebraic sum of the deviations of the various 
values from the mean equals zero. This is important, since it will enable 
us to develop a method for computing which will save an appreciable 
amount of time when we are dealing with a frequency distribution. Let 
us consider a series of five valu€\s, 0, 8, 11, J4, each one of which occurs 
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but once. Then 

(> + 8 + 9 + 11 + 14 48 

X = - =r: - 


Now let us compute the deviation of each vi^liie from the arithmetic mean, 
Xi Xi - X j X- Ao — A', xz = Xz — Xy etc. We have 

A' z 

it -a ♦> 

8 -1C 

() i] 

II -1-14 

J \ t t 4 


it will be, ()l)si‘rve<l that XtX 0, (Ins is always (.rue for any scaies of 
values ’ 


If wo eompule the deviations d of the five items from sorm^ designated 
value js not the arithmetic mean, the sum of these deviations wd 

will j)ot (‘(uuit zero If the dt‘signa(<id value is less than the arithmetie 
mean, there will be loo man^' positive deviations and the sum of the 
(hwiations will be greater than zero. If the designate*! value is greater 
than the arithmetic* mean, there will he too many negative deviations and 
the sum of the deviations will be a ne.<»'aiive (juantity. Since each of the 
live (iV) items has been compared to a designated number whi(‘h is not 
the true in(‘an, the sum of the deviations will fail to equal zero by an 
amount \vhi(‘h is exactly five {X) times the amount by which the desig- 
nated value deviates from the actual arithmetic rnciau. It is therefore 
possible to designate some value as an assumed mean Xd, to determine the 
deviations from tliis designated value, and, })y ad i.ng (algebraically) the 


necessary correcti<^n to obtain the arithmetie mean." ddie process is 
iX 

illustrated in Table 9.J, where A"rt i-s tak^Mi as 9. Here it is observed that 
2f/ - +3 If we divide this figure by A, we see that Xd was too small 

iy 0 (). This is given by 


2d +3 


N 


4 0.(3. 


> S(‘i' App(‘ncli\ S, section 9 1. If “ 0, it is ohvioic- diat - 0. i.s 

referred to as the “first moment al)o*it the mean,” or merely as the “first moment.’' 
111 tlv' following chapter v\e shall have oceasion to consider the second moiucut 
Xz' 

the third moment ^ and the fourth moment • 

^ See Appendix seetion 9.2. 
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Tlus ifl the correction to be added to the assumed mean ; thus, 

X = Xa -f ~ — 9 + ; = 9.6, 

N o 

which agrees exactly with X computed by adding the valuer and dividing 
by 5. 

TABLK 9.1 


CalcnUition of the ArithniPtic Mean, X, 
by Vae of the Asaiinied Mean, 9 


X 

d 


6 

"3 

2)(/ =* -b3 

8 

-1 

~ ~ Sri 

9 

0 

Y = Yd -f- -^7 

11 

H-2 

3 

14 

-f5 

* 9 + 7 9.6. 


-t-3 

0 


In the foregoing illustration, Xd was less than X, Suppose sve choose 
Xd as 13. The computations are shown in Table 9.2. 

TABLE 9.2 

Calculation of the Arithmetic Mean, X, 
by Vfi€ of the Aanninieti Mean, - 13 

A d 


6 

”7 

Xd = -17 


8 

9. 

— 5 

S - X. f 

Xd 

V 

11 

-2 


“17 

14 

4-1 

- i:i -f 

-r - 9 6, 


-17 


In this case, Xa wa.s larger than X, as is indif’ated by --- - 

“>3«4. The re.suit is, as bel'ore, X 13 — 3.1 = 9.6. 

A second property of the arithmetic mean, which is of importance in 
connection with later disc ussions, is thi»t the sum of the squared deviations, 
i.s less when the deviations are taken around X than when they an 
taken around any other value. "J'his is demonstrated in Appendix S, 
Section 10. 1 . 

The arithmekir mean from grouped data: long method. I'able 
9.3 shows the fre(iuoncy distribution of the gra<los of cadet-midshipmen, 
and it is desired to ascertain the value of X for the series. When dealing 
with a frequency distribution, we do not ordinarily have the original data 
from which the frequency distribution was made. When we do have the 
unclassifieil data (as in Table 8.) j, we can obtain the value of the arith- 
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metic mean most accurately by totaling the values and dividing by the 
number of items. When we have only the frequency distribution, we 
must compute the moan from the grouped data. Let us proceed to com- 
pute X for the frequency distribution of Table 9.3, and then compare our 
result with the arithmetic mean computed from the unclassified data. 

In computing the arithmetic mean from a frequency distribution, we 
take the mid-value (sometimes called the class m.ark) of each class as 
represen Uitij/e of that class, multiply the various mid-values by their 
corresponding frequencies, total these products, and divide by the total 
r.unibcr of items. Symbolically, if Xu X 2 , Xz * • * represent the mid- 
v.hies and /i, / 2 , /a * • ' the frequencies, then 

_ fiXi + 12 X 2 + hX, d S/Y 

" - zj n^' 

The mid- value a clas^is obtained by adding the upper and.,.iQwer 
lii ifs of the class and dividing byJ2. For every frequency distribution, 
we muot consider carefully what those limits arc. For the distribution 
of i ible 9.3, we might take the limits of the first class as 72.0 and 74.0, 

TABLK 9.3 

Cij^mputation of the Arithrnetiv Mean for Grades of 
the 1952 Grafluatittg Class of the United States 
* Merchant Marine Academy by I sc of the 

Expression 


1 

! 

Number of 
emlet- 
rniUnhipmeii 
f 

! v'aiue 

i of eja.sri 

V 

1 

/A' 


72 o-7;t \> i 

7 

72 95 

i 

510 

55 

7t 0 -7o n ; 

31 

, 7-i 95 


2.323 

45 

76.0-77.'.i I 

12 

j 76 95 

1 

3,231 

90 

78 0 -79 9 

54 

1 78 95 


4,26:5 

30 

80 0-81 .0 

33 

80 96 


2.571 

35 

, 82 0-8:5. 9 i 

21 

1 82 96 


i,99() 

80 

84 0-8.5 » 1 

22 

i 84 95 


1,868 

90 

86 0 87 9 1 

8 

1 80 95 


69.5 

m 

i 88. 0-89 9 1 

4 

1 88 95 


.355 

80 

Total 1 

22o' 

i 

i 

17,91 1 

75 

X 

S/X 17,911.7.5 
■= “at “ “225 - = 




giving a mid-value of 73.0. This would be correct if the grades had each 
been rounded to the last completed tenth, so that 72.0 included values 
ranging from exactly 72 to 72.099 • • • ,72. 1 included values from exactly 
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72.1 to 72.199 • ' , and so on» instead of having been rounded to the 

nearest tenth, as was actually done. If rounding had been to the last 
completed tenth, the class should have been designated ‘*72 and under 
74.'' Since we are dealing with a continuous variable, the limifs of such 
a class would bo 72 and 74, and the mid-value TIL For the cadet-mid- 
shipmen's grades, rounding was to the nearest tenih, and the lowest value 
which could fall in the class ‘*72.0 73.9’' is 7l.9o, while the highest value 
is 73.9499 ' • * . Thus, since the variable is eoutinuous, the class limits 
arc 71. 9o and 73.95. and the mid-value is 72.95. The mid-values liave 
been entered in Table 9.3 according to this procedurt\ 

When a class is designat(‘d (for example^ “32.00 33,99,“ tfne mid-value 
is actually 32.995. Many statist irians wonUb however, state the mid- 
value as 33.00, since the relative dis(*rej)ancy is small. In determining the 
mid-vahies for a fre<[iiency distribution, it is important to ktKiv* how the 
readings were rounded. When no information concerning tlie rounding 
is given in connection with the fre((uency distribution, it is probably best 
to assume that figures were rounded to the nearest unit givnm. For 
example, if a one-inch class is written “12.0 12.9 inches,’' consider the 
limits as 11.95 and 12 95 iiu'hes; if a live-pound class is written “ 10 i 1 
pounds," consider the limits as 9.5. and 14.5 pounds. Il()vve\ er, for dis- 
crete data, a $2 class “810 0()-$11.99" has the limits $10.00 and *11.90, 
and a $10 class “870- $79" has the limits $70 and $79 if data were given 
only ill w’hole dollars. A class should not be written “5 pouiicl.^^ l)Ut under 
10 pounds" unless we inean'cxactly what we say; namely, that items in 
this class do not fall belo.v 5 pounds and do not equal 10 pounds. If the 
classes for the cadet 'inidshif)men's grade.s were written 72,0 74 0. 
74.0-76.0, and so on, aiid if cases falling on^a class limit were divided 
between the two cla.-^Kses, as jioled in (.'hapter 8, the mid-values would be 
73.0, 75.0, and so on. 

Considering the mid-values for the grades of e.adet-midshipinen as 


disc.us.sed above, and Uvsii^g the expres.sion X 


- 1 
N 


we find that the 


arithm<4ic mean is 79.61, as .shown below 4’ttble 9.3. From the unclassi- 
fied djita of Tabic 8.1, let u.s compute the value of A’ to see how nearly the 
figure just ('btained agrees with that value, If we total all of the indi- 
vidual grades and divide by 225, we have 


X - 


17,912.3 

225' 


79.61. 


The two values for X are exactly the same. It is unusual for them to be 
identical, but we can generally count on a difference of not more than a 
few per cent at most. The value o^ the arithmetic mean computed from 
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a frequency distribution will generally be in close agreement with the 
arithmetic mean from the unclassified data if the variable is continuous 
and the distribution is symmetrical. If (1) the distribution is skewed or 
if (2) the variable is discrete (or if the data are broken), or if both (1) and 
(2) are true, the agreement will be less close. Likewise, close agreement 
cannot be expected if the data contain irregularities because an unduly 
small sample was used. 

Whenever lack of agreement between the two values for is present, 
it is due to the inadequacy of the mid-value assumptions. It is almost 
always true that none of the. mid-values is actually the true concentration 
point of its class. However, a glance at Chart 8.1, 8.2, or 8.3 will suggest 
that, for groups to the left of the group of maximum frequency, the mid- 

TABLE 9A 

Comparison of the Class Mitl^valnes with the Arithrnetw Mean for 
(*a€'h Class for the Grades of Cfidet -Mitlshipmen 



' Nurnher of 

i Total of gradt.s i 

Arit})metie ! 

! Mid-value 

Grade 

cadot- 

1 in ea<;h class | 

mean for 

of each 


mi(i.shipmen 

j (from 'Fable 8.4) | 

each ('lass 

(dass 

72 o -7;{ y 


i " " 511 (f ' ! 

1 73~T)o'~*' 

'72 95““ 

74 0-75 !t 

: :u i 

i 2,331 7 i 

i 75 22 

74.95 

76 0- 77 9 

42 

3,2.30 0 

1 77.05 

76.95 

78 0-70 0 


4,255 6 

! 78 81 

78.95 

80 0 81 0 


2,606 0 

1 80 79 

80 95 

82.0-8.'! 9 

24 

l,91U 5 

82 98 

82 95 

84 0-85 9 

22 

1,868 8 

84 95 

84 95 

86 0-87 9 

8 

606 0 

87 00 

86 95 

88 0 -89 9 

4 

355 7 

88 92 

88 95 ' 

Total 

225 

i “ 17,9»2 3 

"I'j 



value of a group is probably less than the mean of that group; while for 
groups to the right of the group of maximum frequency, the mid-value of 
a group probably exceeds the mean of that group. Although all the mid- 
value assumptions are usually incorrect, there is a definite tendency for 
the errors to offset each other, provided the distribution is approximately 
syK*metricaI. For the data of cadet-midshipmen’s giades, we have the 
unclassified data from which the frequency distribution was made and we 
can compute the arithmetic mean for each clas*^ and compare the class 
means and clas.s mid-values. This has boeu done in Table 9.4, where it 
may be seen that for the first 3 cIasse^ he mid-value of each class is less 
than the class mean. For the last 5 classes, 2 of the mid-values exceed 
their class means and 2 of the mid- values are less than their class means; 
in the case of one class, the mid-value and the class mean are the same. 

The arithmetic mean from grouped data: short methods. In 
Tables 9.1 and 9.2 it was shown that we could assume a value X</ for the 
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arithmetic mean and, making use of the fact that Sx » 0, compute the 
necessary correction to obtain X. This method will save us appreciable 
time in computing the mean from a frequency distribution. The expres- 
sion for ^ is as before, except that the symbol f is introduced because of 
the frequencies in the various classes. Thus, 


1 = 


S/rf 


The .selected value fur may be the mid-value of any class. In Table 
9.5 Xi has been taken as the mid-value of the fourth class, and the corn- 

T.iBI.E 9.!; 

Computation of the Arithmetic Mean for Graiies of 
the 1952 Graduating Class of the United States 
Merchartt Marine Academy by Use of the 
Expression 



1 Number of | 


r 

1 

Grade 

eadel- 

; midshipmen i 

! / 1 

a 

1 } 

1 

72 0-73 9 

! 7 

- 0 

1 - 42 

74.0-75 9 

31 ' 

- 4 

i -121 

76 0-77.9 

42 

O 

! - 81 

78 0-79.9 

54 ; 

0 


80 0-81 9 

33 i 

+ 2 

■i 06 

82 0-83.9 

24 i 

4 4 

I 4 9() 

84 0-85 9 

22 ! 

+ 0 

j 4 132 

80 0-87 9 

« 1 

4- H 

i -1- 04 

88 0~89 9 

^ ! 

4-10 

' + 40 

Total 1 

225 ! 



X 

V i. 

- Y, f- * 

78.95 4- 

148 

225’ 


fd 


-250 


+398 
4- lift 


« 78.95 f 0.058, 

- 79.61 

putations below the table show that X = 79.61, the same as found by the 
longer method of Table 9.3. 

It will be observed that all of the classes of Table 9.5 are of the same 
width. When this is true, we may further shorten our computation of X 
by taking our deviations from Xd in terms of class intervals^ d\ Our cor- 
, 2/d' 

rection will then be in terms of class intervals and must be multiplied 
by the class interval % before being algebraically added to Xd> For the 
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arithmetic mean, then, 







1 . 


The computation of X by this expression is shown in Table 9.6 and 
yields the same result as given in Tables 9.3 and 9.5. This method should 
always be used when a frequency distribution is made up of equal class 
intervals. The greater the number of classes and the greater the number 
of items included in a frequency distribution, the more time is saved by 
this procedure. 

The arithmetic mean from grouped data having unequal class 
intervals. For a frequency distribution having unequal class intervals, 
the computation of X by the method showm in '^Fable 9.6 would be 

TABLE 9.6 


Compulation of the Arithmetic Mean for Grades of 
the 1952 Graduating Class of the United States 
Merchant Marine Academy by Use of the 
Expression 

Vd' . 


Y X, + 


N 


Grade 

Number of 
cadet- 
midshipmen 

/ i 

i 1 

d' 

1 

fd' 

72.0-73 9 

7 

-3 

-21 

74.0-75 9 

31 

-2 

-62 

76 0-77.9 

42 

-1 

-42 -125 

78.0-79 9 

54 

0 


80.0-81 9 

33 

1 +1 

! -hS.-k 

82.0-83 9 

24 

! +2 

-h48 

84 0-85.9 

22 

+3 I 

-1-66 

86 0-87.9 

8 

+ 4 1 

-,-32 

88 0-89.9 

4 

+.') i 

-f-20 -1-199 

Total 

225 


1 . 4-74 


?/^z» 78.95 +j^2. 

- 78.95 -h 0.658, 
« 79.61. 


awkward because fractional values of iV would be involved. The 
appropriate procedure is either tin shown in Tabic 9.3 or that of 
Table 9.5. When classes vary in width, the distribution is invariably 
skewed, and we must remember that, as skewness increases, the errors in 
our mid-value assumptions offset each other less closely. Thus the mean 
computed from a frequency distribution having unequal class intervals 
may differ markedly from the mean computed from the unclassified data. 
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Furthermore, as will be discussed at the end of this chapter, the arithmetic 
mean of a decidedly skewed distribution is* of limited usefulness. When 
a frequency distribution, such as that of Table 8.5, has a class of inde- 
terminate width at one end (or, occasionally, both ends), there is no 
indication of the value which should be chosen as representative of the 
class. If it is assumed that the indeterminate group has the same width 
as the preceding one, the mid-value will usually be too low. The use of 
such a mid-value may result in offsetting the upward bias of the pre- 
ceding mid-values, but we can never be sure how much offsetting takes 
place or that it ma}” not even ov^erbalance the bias. The reason a class 
is left indeterminate is usually that it contains a few items scattered over 
a wide range of values. 

It should be emphasized that the value of the arithmetic mean com- 
puted for a skow’ed distribution having unequal class intervals is only a 
reasonably good approximation. It becomes even less a(‘curate wht^n one 
or two indeterminate classes are present. The difli(‘ully involved in the 
computation of the mean for such a distribution is completely resolved if 
a footnote is added to the table giving the total of tbe imclassitiod data. 
If this procedure is followed, a single division suffices to give the value of 
the arithmetic mean. 

Modified forms of the arithmetic mean. Instead of comf>uting 
the arithmetic mean for all of a series of items, it may occasionally suffice 
to make an approximation by taking the average of the smallest and 
largest figures. The result' of such a procedure will not differ greatly 
from the arithmetic mean if we arc dealing with a continuous variable (or 
a discrete variable which does not show gaps) tlie distribution of which is 
symmetrical or nearly .so. For example, meteorologists have found that 
it is not ordinarily necessary to take hourly temperatures throughout a 
day and average these 24 rcading.s to arrive at the daily mean tempera- 
ture. It ordinarily suffices to average only the maximum and minimum 
temperatures. These two readings may be obtained from the high and 
low points shown on the graph traced by a re(iording thermometer, or 
they may be had from a thermometer which automatically records the 
maximum and the minimum temperatures. 

It will he recalled that the data of cadet-midshipmen's grades is skewed 
to the right. CJonsoquently wc should expect the average of the lowest 
and highest grades to exceofl the arithmetic moan computed from all of 
the grades. Let us determine the average of these two extreme values 
and see how far it departs from X, The highest grade shown in Table 8.2 
is 89.6, while the lowest grade is 72.1. The average of these two grades is 
80.85, The value of X computed from the unclassified data was found 
to be 79.61. Although the discrepancy resulting from averaging the 
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extremes is only 1,24, or L6 per cent, we should not use this method as an 
approximation of X unless the distribution is symmetrical or nearly so. 

A second modificatir)n of the arithmetic mean is one which will be 
referred to again in (.‘onnection with the measurement of seasonal move- 
ments (Chapter 14) This modification consists essentially either of 
igiKjring certain items on the basis that they are unusual extreme values, 
perhaps resulting from the introu action of a non-homogerieous or non- 
c.cur.parable factor into the silt* itiun, or of dropping one or more of the 
'st arid lowest values in an so that only the more typical values 
are .^veragCM-J. 

vhippose that a runner has competed in the lOO-yard dash in ten track 
meets during a season, and that his times were as follows: 

10.2, lO.i, 10.0, 10.0, 10.1, 10.0, 0.9, 10.1, il.4, 10.2 seconds 

Now' an arithmetic moan of these ten tigurcs is 10.2 seconds, although 
only were run this slow oi slov'er. In the race represented by 

the rhnth tigure ulrove, the runner was spiked and limped in to finish an 
extrenu'ly poor last. The figure 11.4 does not indicate his running ability 
and could (|uite logif-aliy be excluded in arriviirg at a mean time which 
ivpiesonts this runner’s ability. If v'e average the other nine figures, 
we ob(-aiij 10.07 seconds as the arithmetic mean for this runner under 
normal* running conditions. In like fashion, if one race had been run 
with a strong wind at the runner’s back, his time would be abnormally 
short for the 100 yaids and tliat figure, too, might bo omitted.^ The pro- 
ceduie just described dilTers from the one follower* in m(?asuring seasonal 
movmrncrits in that onl;y the particular values for i ich a specific reason 
could be definitely assigned have been elirniiiyted. When measuring 
seasonal movements, w e shall drop one, two, or n ^re items at both ends of 
an array in order to avcr:«g(' the items whi(*|} seem to cluster around some 
central value. 

Averaging percentages. It was pointed out in Chapter 7 that a 
series of pen*entages based on different numbers sho^dd ordinarily be aver- 
aged by weighting each pcicentage in proportion to its base. There are 
eouditions, hov.'cver, under wlii(‘h we might warn, to ignore the different 
bases and to nverago so\crai perctudages using a different system of 
weights. For example, let us a.ssun' that a student huo taken two com- 
prehensive (jxaminations, each covering cue-half of the sunject matter of 
a course. Suppose that the first examination included 100 “true-false’^ 
questions, upon which he made S2 per cent, while the second included 150 

’ A discussion of thi.s type of modified mean when ut«‘d in connection with time 
studies is given in F. K. Croxtoa and D. J. Cowden, Practical Business Statistics, 2r)d 
l»d‘, Prentice-IIali, New York, 1918, pp. 17: 176, 
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such questions, upon which he made 88 per cent. Since each percentage 
represents a level of accomplishment for one-half of the work of a term, a 
better description of the work of the student for the terra would weight 
the two percentages equally, resulting in an average of 

82 + 88 


rather than weight the percentages according to the numlver of questione 
asked, giving 

iiPf? X 82 ) + (150 X 88) ^ g 
250 


If the second examination had been based upon 10 “ essay questions, it 
is even more apparent that the weighting should not be determined by tlie 
number of questions included. 

Averaging averages. The general outlines of the problem of aver- 
aging averages are the same as those involved in averaging percentages. 
If we have several averagCvS, each referring to a category, and wish to 
average these averagCvS in order to arrive at a statement compatible vith 
that referring to the total composed of these categories, it is necessary to 
weight each average according to the importance of its category. For 
example, if seven football linemen averaged 210 pounds in weight and 
four backfield players averajred 186 pounds, we might add tiie two means 
and divide by 2, obtaining 198 pounds. That, however, is not the correta 
arithmetic mean for the weights of the elev<^n players. We obtain tlie 
correct figure from 

(7 X 210) + (4 X 186) 2,214 . 

201 pounds. 

11 H * 


This is the figure we woul<l get if we a<lded the individual weights for the 
eleven players and divided by elevmi. 

As in the case of percentages, there may be some instances in which the 
importance of each category is dependent upon some factor other than 
the number of items included in the category. Suppose Uiat 12 tires have 
been run on a group of test trucks unloaded except for the driver, and 
have shown an average mileage of 13,618 miles. Suppose that 20 similar 
tires have been used on a similar group of test trucks each (tarrying the 
driver and 2,000 pounds of load, and have shown an average mileage of 
12,136 miles. 'Fhe weighted average of mileage w'ould he 


(12 X 13,618) -f- (20 X 12,130) 
32 


12,692 miles* 
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What we have done is to assign = 1 .67 times as much weight to the 
second average as to the first. Actually, trucks sometimes travel 
unloaded, sometimes loaded, sometimes partly loaded, and sometimes 
overloaded. If the truck.s in our illustration travel r of their mileage 
unloaded and J of their mileage loaded, we should arrive at our average by 


(1 X 13,()J«) + X 12,136) 
o 


12,432 miles. 


It is the importance of the various load conditions in the use of the truck 
which should be considered in weighting rather than the number of tires 
tested. 


THE MEDIAN 

The median from ungronped data. The median is usually 
deiiiiea a.’, that valut which divides a distribution so that an equal number 
of items is on either side of it./ If we have five items, $5, $6, $7, $8, $10, 
it is ap])arerit that the value of the median is $7, since there are two items 
below that value and +^\o items above it. If we have six items, 2 inches, 
5 inches, 6 inclies, 7 inches, 9 inches 12 inches, it is clear that any value 
greater than 0 inches and less than 7 inches will satisfy our definition. As 
a matter of pra<dice, when there are an even number of items, we usually 
take the value of the median as halfway between the two central items. 
In this instance the median would be 6,5 inches. 

If we are dealing with a series of values such 52, J3, 14, 15, 15, 17, and 
18 pounds, there is no value which is so locatCi‘ that three items are 
smaller than it and three items are larger than it. We would, liowovei, 
dc\signate 15 pounds us the median. It must obvious that the defini- 
tion first given does not hold for situations such as this. ^The definition 
is therefore re(“ast thus; the niedian is that valve which divider a series so 
that one-half or more of the items are equal to or less than it and one-half or 
more of the items are equal to or greater than it, t 

From what has already been said, it is obvious that the median cannot 
readily be located unless the data have been put into an array or, as we 
shall see shortl^y, into a frequency distribution. It will be recalled that 
no arranging is uecessar}^ for com uting the mean, uince the items of a 
series may be totaled no matter what their order. 

' The value of the median of a series may or may not coincide with the 
lvalue of an existing item. When there is an odd number of items in an 
pirra^q the value of the median coincides with that of one of the items; 
' when there is an even number of items in an array, it does not. 

An important properly of the median, which will be referred to again, 
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is that it is influenced by the position of llie items in the array but not 
by the size of the items. It has already been observed that the median 
of $5, $6, $7, $8, $10 is $7. The two larger items may have any values 
greater than $7 and the two smaller items may have any values smaller 
than $7, yet the median remains $7. 

Before proceeding to a consideration of the computation of the median 
for grouped data, let us compute the value of the median for the grades 
of the 225 cadet-midshipmen arrayed in Table 8.2. \Vc want to find the 
value which is so located that 1 12 items will be on either side of ii. This 
is, of course, the value of the 113th itein,^ and counting from either end 
reveals that the value of the median is 79.0. If we had an array of 200 
items, we should find the value which divides the distribiitiori so that 100 
items fall below and 100 above it. This is obviously the mean of the 
100th and lOlst items coiinled from either end of the array. 

The median from grouped data. To determine the value of the 
median of a frequency distribution, we count half of the frequencies from 
either end of the distribution in order to ascertain the value on either sale 
of which half of the frecpieiioies fall. ^To determine the value of the 
median for the grade.s of the cadet-midshii)men (Table 9.6), wc first com- 
N 

pute 7- — 112,5. We then proceed to a^scertain the value of the median. 

2 

There are 80 frequencies included in the first three' classes of the distribu- 
tion. The estimated value of the median is therefore obtained by inter- 
polating 32.5 ireajLiencie.s (112.5 - 80) into the fourth class, as.-tuming 
that the frequencies in that class are evenly distril)uted within the flas^. 
The median, then, i.s giveii by the expresMon 

Med = T?.!).") f -- 2 = 77.95 + 1.20 - 79.15. 

54 

Exactly the same result is obtained if wc begin our computations from the 
other end of the distrilnition. There are 91 fn^quencies included in the 
last five clii.sses, and we proceed to interpolate 21.5 fre(|uoncie.s (1 12.5 
91) in^o the fourth class, /mm the upper limit toward the lower limit. The 
result is 


\ ^ For (iap;roupofl data It now .scfMT) corivcninnt to find the vjilue of the median by 

! . A' h I . 

’ counting ^ iicm.s, Ijcginning with the highe.st (or lowest) item in the array. This 

I is not the same as saying that the median is the ^"^^th item. Although some 

' porsoas hold this corn ept, it is not sati.sfactory. The concept of the middle item as 
: the median is unsati.sfactory wlnui the array eohsists of an even number of items, and 
I must be abandoned when th^ median is de crinined from grouped data. 
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21 5 

Med - 79.9r> - ----2 = 79.95 - 0.80 = 79.15. 

54 

The value of the median is, of course, the snrne whetlier we begin our 
computations from one en<l or the other. 

The value of 79.15 just obtaiiied for the median from the frequency 
distribution is in very close agreement witli that of 79.0 found from the 
array. Unless the data contain gaps or irn^gularities, we can expect 
rather close agreement wdien dealing with a continuous variable, and like- 
wise for a discrete variable if the data are not broken. 

Wc have now computed the values of the arithmetic mean and -he 
median for the frecpiency distribution of ca<^ot-midshi])^nen^s grades. 
The mean was 79.01. The median was 70.15, I'he mean exceeds the 
median because the distribution is skewed to the right-. If a distribution 
is exactly symmetrical, the mean and the median are identical. If a dis- 
trib'’'^*' skewed to the left, the mean will he less than the median. 
This point will be tn»ated more fully at thi» end of this chapter and in the 
following chapter. In Chapter 10 Ave shall see that one Avay of measuring 
skewness involves con.sideration of the values of the mean and the median. 

The eomputation of the median from a frequency dislribiUion of 
unequal class intervals does not difTer from that just described. Xeither 
does rtie presence of indeterminate groups at either or both ends com- 
plicate the procedure. 

If an ogive of a distribution is plotted, it is possible to obtain the value 
of the median graphically, as shown in Chart 9.K The process is the 
graphic (uiuix alcnt of the cornpiitauuns alr(\'idy ' ule and consists of the 

-V 

following steps: (I) Compute and locate this point on the vortical scale. 

(2) DraAV a perpendicular to the T-axis at this point and extend the per- 
pendicular to intersect the ogive. (3) At the intersection, drop a per- 
pendicular to the .Y-axis. The intersection gives the value of the median. 
From Chart 9.1 it is seen that, for the grades of the cadet-midshipmen, 
the value of the median, located graphically, is 79.2, which is in close 
agreement with that computed arithmetically. 

The quartilcs, quint ilcs, deciles, and percentiles. The median 
characterizes a series of values bcc^euse of its midway position. There are 
several other measures of the frcciucncy distribution which, t.aken indi- 
vidually, are not measures of central tendency but, as we shall see later, 
may be used to assist in mt^asuring dispersion and skoAvness. They are, 
however, allied to the median in that they arc based upon their position 
in a series. We shall therefore digress at this point to discuss the quartiles, 
quintiles, decile,-, and percentiles. 
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There are three quartiles, Qi, Qt, and Qt, which divide the distribution 
into four equal parts. Qt is, of course, the median and is generally so 
designated. To determine the value of Qi, the first or lower quartile, for 


N 225 

the data of cadet-midshipmen’s grades, we count - = --- = 56.25 


frequencies from the low’er limit of the first class. Thus for the value of 


NUMBER OF 
CADET 'MIDSHIPMEN 



Chart 9.1. Graphic Location of the Median for Grades of the 
1952 Graduating Claw of the United Slatca Merchant Marine 
Academy. Data of Table 0,b. 


Qi we have 

IQ OK 

Q, = 75.95 + 2 = 76.82 

3iV 

The same re.sult may be obtained by counting — from the upper limit of 
the last class. 

. SAT 

The value of the third quartile Qs may be computed by counting — 
from the lower limit of the first class more expeditiously, by counting 
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N . iV 

— from the upper limit of the last class. Since — — 56.25, and since 
4 4 

there are 34 frequencies in the last three classes, we have 


22.25 

Q, = 83.95 - 2 


82,10. 


There are four quintiles, which divide the distribution into five equal 
parts; nine deciles, which divide the distribution into ten equal parts; and 
ninety uiine percentiles, which divide the distribution into 100 equal parts. 
The procedure for computing these values is similar to that for the median 
and the quartiles. For example, we shall compute the value of the 3rd 


decile, which is also the 30th percentile. We count 


from the lower lin^it of the first class and interpolate. Since there are 
38 fr<»ouencies in the first 2 groups, we have 


75.95 f 2 = 77.35. 
42 


Unless a distribution is very extensive, there would be no purpose served 
in computing very many of the pv.rccn tiles. Freciuent use is made of 
only a few of them, such as the 09th, 98th, 95th, 90th, 85th, 80th, and so 
forth. 

The terms quartile, quintile^ decile ^ and percentile are sometimes used in 
a different sense, to refer to the part of the distribution in Avhich an item 
falls. Thus, if a student is said to he in the *.*oner quartile of his class, 
he is in the upper 25 per cent. If he is in the upj er decile of his class, he 
is in the upper 10 per cent. It would undoubtedly lead to clarity of 
expression if we reserved quartiles, quintiles, deciles, and percentiles to 
mean the measures discussed at the opening of this section. To refer to 
the part of a distribution in which a student falls, we could say ^‘highest 
quarter'^ (above Q.<), “seciond highest quarter” (between Qt and Q 3 ), 
•‘Hhird highest quarter” (between Qi and (?i), and ''lowest quarter” 
(below Qi)- Similarly, we could say "fifths” in place of quintiles, 
"tenths” instead of deciles, and "hundredths” instead of percentiles. 


TI?v MODE 

The mode from ungrouped data. The mode of a distribution is the 
value at the point around which the items tend to be most heavily con- 
centrated. It may be regarded as the most typical of a series of values. 
For this very reason it is apparent that the occurrence of one or a few 
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extremely high (or low) values has no effect upon the mode.® If a series 
of data is unclassified, not having been either arrayed or put into a fre- 
quency distribution, the mode cannot be readily located. 

Taking first an extremely simple illustration: If seven men are receiving 
daily wages of So, Sfi, S7, S7, $7, $8, SIO, it is clear that the modal wage 
is $7 per day. Ifave have a series of values such as 

3, 5, 6, 7, 9, 10, 11, 


it is apparent that there is no mode. 

The mode from groupc'd data. If wo examine the array of cadet- 
midshipmen’s grades sliown in Table 8.2, we find that it would be very 
difficult to determine the value around which the ite||3, tend to concen- 
trate. The mode may be located readily by frequeru‘y 

distribution such as Table 9.0. Ilerq^it is clear that group is 

78.0-79.9; and if we take tin? mid-vah^ as repn?sontative 
should call 78.95 the mode. 

However, there is evifience here tliat the mid-value is not the hfibfcl^ti*- 
mate of the mode. Since there are more frc(iuencies in the claiStfif pre- 
ceding the modal class than there arc in the class following the modal 
chiss, it is logical to expect that the actual concentration is toward (ho 
lower limit of the class. We shall make use of the fre(|ueneies ii‘ tliese 
two adjacent classes to infer the probable concentration puitit \\ithiii the 
modal class. The expres.sioit is 


Mo = Zi + 








where h = the lower limit of the modal class; 

Ai = the difference between the frequency of the modal class aiul 
the freciuency of the preceding class (sign neglected); 

Aj = the difference between the frequency of the modal class and 
the frequency of the following class (sign neglected); 
i = the interval of the modal class. 


^ This is truf' in rrspr^ct to thf* u.sual method of locating the mode which i.s described 
here. If the mode is located by the expression 


Mode 


_ y/% + 3 )_ 

■ * 2(50s ~ (iifl, - 9)’ 


or by determining the X vaiue j\iftt below the j)eak of a fitted curve, the extreme 
values do have some slight influence. Th'j computation of s /Ji, and is discussed 
in the following chapter. 



For the frequency distribution of 
grades of the cadct-midshipraen, 

Mo - 77.95 

+ i? o 

(54 - 42) f (54 - 33) 

= 77.95 + 2 =• 78.G8. 

33 

Th(» interpolation which we hn-ve 
made may be illustrated graphically 
as‘ shown in Chart. 9.2. It. should he 
realiz(id that we are merely making 
an estimate of the value of the mode. 
NevTrtheles.s, it is a useful estimate, 
and it should be remembered that 
the mode has two important ])roper- 
ties; first that it rejiresent.-. the most 
typical v^alue of the distribution and 
should 'coincide with existing items; 
second, that the mode (as usually 
comput(‘d) is not affected b}" the 
presence of extremely large or small 
item.s. ^ 

(Graphically we ma}' obtain the 
mode from a column diagram, as in 
Chart 9.2. We may make a very 
rough approximation (;f the mode by 
reading the value on the -Y-axis cor- 
responding to the highest point of the 
freejuency curve or corresponding to 
the steepest portion of the ogive. 
The curves may be smoothed free- 
hand, since, unless the s(n’ies has been 
subjected to a smoothing process, we 
would obtain a value about the same 
as the mid- value of the modal group. 

Upon occasion, series are encoun- 
tered which have two modes and are 
referred to as hi-modal. Such a series 
is pictured in Chart 9.3. Sometimes 
bimodality is the result of chance; 
sometimes it results because of tlie 
fact that two sets of non-homogene- 
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Chart 9.2. Diagranifnatic Illus- 
tration of the lVletho«l of Interpolat- 
ing*: for the Value of the !\Io<le. Ai 

exerts Jill iipwrud infkn^uee, ami Aj ex- 
erts a downward inllmMioe, cacdi in pro- 
p(»rlioi! to its magnitudi', .so thjil the 
mode divides tlie interval of the modal 
vhiss into two parts pioport ional to Ai 
and Aa. That is, 

Mo ^ 

- Mo ^ A;' 


Geometrically, the mode may he lo- 
cated by dropi^ing a vertical line from 
the intersect' the two diagonals as 
.shown on the coj .oam. 

Algebra'cally he fxj)ression 


Mo = /i -f 


Ai -f A/ 


may be developed as follows: 

We wish to locjite the mode so that 


Mo - h _ Ai^ 
f. Mo “ A./ 

A, .Vo -- An/i - XU - AiiVo, 
A, 3/e + A, Vo = XU + A 2 / 1 , 

.Vo{Ai -j- A 2 ) = XU f ^2U> 
But /2 = /i 4 - 2. 

XU Xi “b XU 

“ "■ A. + 

XU 1 A lt 

+ A2 Ai -b A-i 

” + A, + Aj'- 
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OU8 data are present. In Chart 9.3 the two concentrations are attributa- 
ble to the fact that some drivers were on full- (or nearly full-) time work, 
while others were working only one or two days a week. 

CHARACTERISTICS OF THE MEAN, MEDIAN, AND MODE 

Before proceeding to a consideration of other measures of central tend- 
ency, we shall examine the characteristics of these three relatively simple 
and very important measures. 

NUMBER or 
DRIVERS 



Chart 9.3. Distribulion of Wages Received in Half Month by 
Drivers in Bituminous Coal Mines, llIinoiM, 1933. Data from United 
Stated liurcfiii of Labor Statistirs, Wagf'H am/ Hours of Labor in Bituminous- 
Coal Mining: liulletm No. bOl, p. bl. 


Familiarity of the concept. The arithmetic mean is the most 
widely used of all the measures of central teudeney. As will be pointed 
out later, it is frequently used tuider conditions which cause it to be mis- 
leading. The median is less well known than the arithmetic mean, but 
it is based on a simpler concept, Uso less well known than the arithmetic 
mean, the concept of the mode as the most usual or typical of a group of 
items is probably the simplest of the three. 

The concepts of the three measures may be illustrated by means of the 
three parts of Chart 9.4. The mean is at the point of balance, or center of 
gravity, such that XfX on one side of the mean equals XfX on the other 
side. The median divides the curve into two equal areas. The mode is 
the value below the peak of the curve. 




Chap. 9] 


MEASURES OF CENTRAL TENDENCY 


193 


Algebraic treatment. The arithmetic mean may be treated algebra- 
ically: 

w SZ . 

(a) Since X = '~j^) it follows that, if any two of the three factors (the 

total, the arithmetic mean, the num- 
ber of items) are known, the third may 
be computed. Thus 


X = 


■N 


N ’ 
SZ 
X ■ 


SZ = NX; 



The vit'-ies to the right of X l)al- 
ance the ealiies to the left of X. 



U. 

curve 


One )ialf of the urea nnrler the 
IS on ouch side of the ordinate 


erected at ini'dian. 


(b) Using appropriate weights, a 
series of arithmetic means may be 
averag^'d . yield the arithmetic moan 
of all the data on which those means 
were based. 

The median does not lend itself to 
the type of algebraic treatment dis- 
cussed for the arithmetic mean. Al- 
gebraic -treatment of the mode, similar 
to that sketched for the mean, is not. 
possible. 

Need for classifying data. Hie 

arithmetic mean may be computed 
from unclassified data, from arrayed 
data, from the frequency distribution, 
or (as noted abov^o) merely from a 
knowledge of the total and the 
number of items N . When the arith- 
metic mean is computed from a fre- 
(|Uen(‘y distribution, the value of .Y 
will very closely approximate the value 
of X for the unclassified data. I'he 
more nearly symmetrical the distribu- 
tion, the closer the agreement of these 
two values. 

In order that the value of the 
median may be computed, the data 
must be in an array (at least the central items must be arrayed) or in a fre- 
(picncy distributi -a. The median determined from the frequency dis- 



C. The mode is flin^eily beimath the 
peak ot : he curve. 

Chart 9.4. Location of the 
Arithmetic Mean, the Median, 
and the Mode in a Frequency Dis- 
tribution Skewed to the Hight. 
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tribution will agree approximately with that computed from the array if 
the distribution of items is regular within the class containing the median. 

The mode is most readily located from the frequency distribution, and 
only with some difficulty from an array. King® has pointed out that an 
array of the cities of the Ihiited States according to population of each 
would show no mode. However, if such data were put into classes, a 
modal tendency might appear. It should be borne in mind that the 
process of interpolating for the modal value within the modal group is at 
best only an approximation. More refined methods of locating the mode 
involve essentially the smoothing of the data by formula and the deter- 
mination of the X value of the maximum ordinate. 

Effect of unequal class intervals. When clas.ses vary in width, the 
value of the arithmetic mean may be computed. Such a variation of 
class intervals is necessitated by the presence of marked skewmess (almost 
invariably to the right, or positive) resulting in a value for Y which may 
not be in close agreement with that based on the unclassified data. The 
value of X from such a positively skewed frequency distribution would be 
expected to exceed the value of X from the uncliussificd data. 

The median may ordinarily be determined rather satisfactorily from a 
frequency distribution having varying class intervals. The upper quar- 
tile or one or more of the upper quintiles or deciles might, however, fall in 
a wide class having few frequencies. The necessary interpolatioft would 
in such a case be unreliable. 

When the class intervals of a frecpiency distribution vary in width, the 
mode may be satisfactorily located if the modal group and those on either 
side of it are of the same width. Otherwise the determination is apt to 
be of limited accuracy. 

Effect of classses with open end. The presence of a “Less than 
. . , ” class at one end of a fre<|uency distribution and/or an “ or 
more” class at the other end results in ai: inaccurate determination of 
Y, since mid-values ordinarily cannot be satisfactorily determined for 
such class^^s. 

The presence of open-end classes has no effect upon the determination 
of the median. 

Indeterminate groups do not complicate the process of locating the 
modal value. Occasionally, as wdien working wdth an extremely skewed 
or a reverse J-shaped distribution, the mode is at or near the end of the 
distribution. Under such condition.s there would be no reason for having 
an indeterminate group at that end of the distribution. Incidentally, in 


• Willford I. King, The Elements of Statistical Method, The Macmillan Company, Now 
York, 1919 p. 126, 
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the case of such distributions, the mode is not a measure of central 
tendency. 

Effect of skewness. For a symmetrical distribution, the mean, 
median, and mode are identical. If the symmetrical distribution is 
altered by merely extending one tail so that the distribution is skewed, 
there is no necessary change in the value of the mode (as usually com- 
puted), but the median is changed in the direction of the skewness. Thus 
positive skewness (skewness to the right) increases the value of the 
median. The mean is increased even more, since it is affected not only 
by the fact that there is now an excess of frequencies on one side of the 
mode, but also by the amount by which the various excess frequencies 
deviate from the mode. Although the distribution of grades of the cadet- 
midshipmen is only slightly skewed, the effect of the presence of skewness 
is seen when we recall that the mode is 78.08, the median is 79.15, and the 
mean is 79.61. These values are shown on Chart 10.7. 

Effect of extreme values. When skewness is not general but is due to 
a few items deviating a great deal from the mode, the median wdll be only 
slightly affected. The arithmetic mean, however, is affected by the value 
of every item in tlie scries, and the presence of one or a few^ extremely 
large (or extremely small) items in a series may result in a mean which is 
very misleading. As ordinarily computed, the mode is not at all influ- 
enced by the presence of a few unusually high (or low) extreme values. 

The foregoing is of such great importance that we shall give further 
attention to it. Suppose we have the following series of seven values, 

$12, $11, $15, $15, $16, $18, $19, 

the mean of which is $15.57, tlie median $15, and the mode $15. If an 
extreme value of $25 is added to these seven, the arithmetic mean 
becomes $16.75, the median $15.50, while the mode remains $15. Now if, 
instead of having added $25 as the eighth item, we add $200, the mean 
becomes $38.62, but the median is still $15.50 and the mode $15. The 
effect upon the median of any added value from $i6 to oo is the same. 
The mode was not at all affected by the extreme value, although, if we had 
added a $16 item, it would have been affected. This illustrates a different 
point, also; namely, that the mode is not a useful measure unless it is 
based upon enough items to show a w^ 'l-defined concentration. 

Because of the effect of extreme values upon the arithmetic mean, it is 
sometimes a misleading figure to use to describe a distribution. If we are 
considering the income of a group of people, and if most of them have 
moderate incomes but one or a fe\v have extremely high (or low) incomes, 
the mean will reflect these extremes and to that extent will be atypical 
rather than typical. An alumni association made a study of graduates 
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who had be<^n out of (College 20 years. Among other questions asked was 
one concerning in(‘ome during a specific year. More than 850 question- 
naires were sent out: only 133 replies were received. There is a large 
probability that these replies w’ere selective and any figures deriv(?d there- 
from would be of doubtful value. The mean income of the 133 replying 
was $13,958, but this higli average was due to the fact that there were 
several very large incomes which were definitely extreme values. The 
median income was $7,5(X), while the mode was very close to $r>,0fK). In 
such a case as this, we should not use the mean alone to describe the dis- 
tribution. If only one figure is to be used, it is oetter to nse the median 
or mode, depending upon which concept is of more import^ltu*t^ Lt 
would be much better, of course, to give all three values, and, if [possible, 
a frequency distribution or a freciuency curve. 

Sometimes in dealing with a scries in which suspected heterogeneity is 
present, it may be advisable to use the median in lieu of the arithmetic 
mean. For example, measurements might have been taken of the \veight 
of a number of goldfish, and the figures may reveal the fuesenct' of several 
unusually larg^'- .specimens. It is suspected that, because of jgn<jra?u‘e or 
careles.sness, the emimerutor ineluded a few carp with the gold)i^h. 'I'he 
qtiestionable \-a,}ues could be discarded. llow’<n'f‘r, we are not .svo-c that 
the heavy fish w'crc carp, and perhaps their measurem<*ats .‘^luHild not be 
discarded. The use of the median allows the extreme valines to be repre- 
sented by their position in tlie series rather thaii by their size 

Sometimes have a stu'ies in wijicli there are present extrenK*'^ of which 
we know the number but in')! the individual values. In such a situation 
■we can determine the median or the mode, but not the mean 

When we have a senes of values extending over a great range, any I'on- 
cept of a measure of central tendency is dubious. Suppose W(‘ have the 
values 4, G, 2.00f), and 2,ltX) It is obvious that a mean or a median could 
be computed, but that neither would have any practical meaning. 

fjfTeet of irregularity of data. When data are broken or irregular, 
the value of the mean computed from a fre([uency di.'^tribulion may l)e 
decldfsily diiTerent from the value based on tl'te unorganized data. 

'Idle i'iifne true in the. of the median if gaps oecur among tin* items 
falling in rlu' einss coni aining the median. When gaps oeeur in tlie 
vicinity of the median, the nieciian is not a parricularly good I'Cticept tf) 
Uvse. as its value would i»e erratic if one or two items were added to or 
subtracted from tin? .'crifw. 

If a mode is 'dearly defined, there are not likely to be gaps iH*ar that 
value. When gaps are present near the mode, it is (juite likely that there 
are too few items in the series for the mode to be either clearly defined or 
meaningful. 
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Reliability when ba8ed on samples* In Chapter 24 we shall discuss 
the variation which may be expected in values of the arithmetic mean 
when based on repeated random samples. This volume will not treat 
of the sampling variation of medians or modes. However, for samples 
of the same size from a normal population, the median is subject to 
greater sampling variation than is the arithmetic mean, and the mode is 
more variable than the median. 

Mathematical properties. The arithmetic mean has two important 
properties; first, Sx = 0: and second, iSx" — a minimum. Becaus^^ of 
this latter property, the ineaTi is the usual basis of reference for measures 
erf dispersion. The mean is an important function in many processes 
which will follow in later sections of this book. Among other uses, it is 
essential for fitting the normal curve to oVj.serv''(i data. 

The sum of the deviations from the median (signs neglected) is a mini- 
mum. For this reason, certain measures of dispersion are sometimes 
basefi upuii tile median. 

Selection of appropriate measure, f'sing the foregoing measures 
as descriptive di'vices, the statistician may be faced with the problem of 
de(*iding wdii(‘h one to use to charai'terize a given s<4 of data. In general, 
the measure of central tendency th^'t he should use depends upon (1) the 
nat ure of the distribution of the data and (2) the concept of central tend- 
ency which is desired for a particular purpose. 

If the distribution is symmetri<;al, or approximately so, the three meas- 
ures may be used almost interchangeably. If a series is ske\ved, w'e must 
bear in mind that the arithmetic n^can is fror!:«v]tly not a typical value, 
and that it may be better to use the mode « hich is typical) or the 
median. When there are extremi' deviations or when there is suspected 
heterogeneity, we may use the median in place of the mean, or recourse 
may be had to a modified mean. 

If X is computed, use may bo made of that value to obtain a total. 
Tims, if adults average 150 pounds in weight, it is safe to load about 20 
people in an elevator rated to carry 3,000 poinds. (The figure of 150 
pounds is sorneivhat high for the average weights of adults, but it is the 
figure frequently used to compute elevator capacity. It is obvious that 
the 20 people referred to should not all be heavy persfuis.) If subsequent 
computations are to be made I mlving a measiiic, the mean may be 
required. If a curve is to be fitted to a frequency distribution, the mean 
wull probably be used. If one series of data is eventually to be compared 
with another in respect to dispersion, the mean may be needed. This, 
however, does not mean that the median or the mode should not be used 
for describing either or both of the series. 

The relative s^tanding of a person in a class may be indicated by stating 
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whether his grade is better than the grades of half of the members. This 
rating involves the use of the median. Other statements referring to 
various proportions of the students may be made by using quartiles, 
quintiles, deciles, or percentiles. 

If we are interested in knowing the typical annual expenditure of motor- 
ists for gasoline, we should make use of the mode. 

Since the three measures embody different concepts, it may sometimes 
be advisable to use two or possibly all three. The use of the mean and 
the mode, or the mean and the median, gives us an idea of the amount of 
skewness present, as will be shown in the next chapter. 

Sometimes it is necessary to make a quick estimate of the central tend- 
ency of a series. Under such conditions, the mode may be promptly esti- 
mated from a frequency distribution, and the median may be (juickly 
approximated from either an array or a frequency distribution. Of 
course, if the total and the number of items are given, the arithmetic 
mean may be computed in a few secondvS. 

MINOR MEANS 

The arithmetic mean, median, and mode are frequently thought of as 
the more important measures of central tendemw, because of their wide 
usefulness, simplicity, and general applicability. Under certain condi- 
tions other measures of central tendency may be useful, and we shall 
therefore consider the geometric mean and the harmonic mean. As 
pointed out earlier, the term “mean^^ is frequently used to designate the 
arithmetic mean; consequently, when referring to any other mean such 
as the geometric mean or the harmonic mean, we should always refer to 
the measure bv its complete designation. 

yhc geometric mean. The geometric mean is defined a.s “the Nth 
root of the product of the items." Thus, for the four items 5. 8, 10, 12, 
the geometric mean is 

G = V'^S X 8 X 10 X 12 = V^4800 = 8.3. 

It is mtere.sting to note that the arithmetic mean of these four items is 
8.75. For any series of positive values (not all the same), the geometric 
mean is smaller than the arithmetic inean.^ If one value of a series etpials 
zero, the geometric mean equals zero and is therefore inappropriate. If 
one or more value.s are negative, the geometric mean (uin sometimes be 
computed but may be meaningless. These are important drawbacks to 
its use. 


^ For a demonstration, see Aptwsndix S. secnon 9^.3. 
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Symbolically, the geometric mean is X Xt X Xz X • * • X Xs* 
The computation is usually carried out by means of logarithms, thus: 


logG - 


log Xi “f log A"2 + log Xa + 


+ log X, 


2 log X 


The logarithm of the geometric mean is thus the arithmetic mcaji of the 
logarithms of the values. 

When frecjuoncies are present, each logarithm must he multiplied by the 
corresponding frequency. Thus 


log r; - 


/j log A 1 + /2 log A2 + /a log X3 4- 


2/ log X 

~ x’ 


For a frequency distribution, the geometric moan is usually computed by: 
(1) ascertaining the logarithm of the mid-value of each class, (2) multi- 
plying eac.n logarithmic mid-value by its proper frequency, (3) summing 
those products, (4) dividing by the number of items, and (5) taking the 
anti-logarithm <jf the result. If a series is symmetrical in a logarithmic 
sense (see Chapter 23) and the items are evenly distributed within the 
classes geometrically instead of arithmetically, it is preferable to use the 
rnid- values of tlie logarithms of the class limits rather than the logarithms 
of the mid-values of the classes, ff the raw data are available, it is, of 
course, also advisable to re-form the frequency distribution in order to 
make the class intervals geometrically (Mjual, if that had not already been 
done. 

It will be recalled that the arithmetic mean is the sum of the values 
divided by the number, while the geometric mean is the Nth root of the 
product of the values. As noted before, N tim'^s X gives 2X. For the 
geometric mean, = Xi ■ X’‘2 * X3 • etc.; that is, the geometric mean 
raised to the iVth power equals the product of the values. This leads to 
the rather interesting point that any series of numbers having the same 
N and the same 2X have the same arithmetic rae^^n (for example, 1 and 
11,2 and 10, 4 and 8, 5 and 7,-2 and 14 all have an arithmetic mean of 
6), and that any series of numbers having the same N and the same 
product have the same geometric mean (for example, 1 and 36, 2 and 18, 
4 and 9 all have the geometric mea of 6). 

Another property of the geometric mean is that the product of the 
ratios of the values on one side of the geometric mean to the geometric 
mean is equal to the product of the ratios of the geometric mean to the 
values on the other side of the geometric mean. To illustrate , let u s take 
the values 4, 6, 20, 25, the geometric mean of which is V 10000 10. 

The ratios of the values 4 and 5 to the geometric mean are A and 1^, 
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while the ratios of the geometric mean to the values 20 and 25 are and 
Thus we have 

4 _ ^ _ 10 

10 10 “ 20 * 25' 

1 1 
5 5* 

le ratios to write 


10 10 _ 25 

4 5 ■” 10 io’ 

5 = 5. 


The following paragraphs discuss certain instances in which the geo* 
metric mean is useful. 

( 1) T he geometric mean may be used for averaging ratios. Consider 
the following data : 


Community 


A, 

B. 


Native-born 

inhabitants 

8.000 

1,500 


Foreign-born 

inhahiianU 

4,000 
. 3,000 


Ratio of 

foreign-horn to 
native-born 
(per cent) 
50 
200 


Ratio of 
natire-born to 
fortifjn-hf^n 
(per efiU) 
200 
50 


The arithmetic mean of the two ratios of foreign-born to native-born 
population is 125 per cent. Likewise, the arithmetic mean of the two 
ratios of native-born to foreign-born population is r25'per cent! These 
two averages are inconsistent with each other. This incongruous result 
does not occur if we use the geometric mean, for the geometric mean of 
each of the tw'o pairs of ratios is v'O.SO • 2.00 = l.O, or 100 per cent- We 
could, of course, total or average the foreign-born inhabitants for the two 
communities, and total or average the native-born inhabitants, thus 
obtaining two ratios which are consistent. There are 7,0(X) foreign-bor»» 
and 9,5(X) native-born inhabitants, or an average of 3,500 foreign-born 
and 4,750 native-born inhabitants. The ratio of foreign-born to native- 
born is 


7,000 3, 500 

9,50b 4,750 


73.7 per cent, 


and the ratio of native-born to foreign-born is 


9,500 4,^ 

^000 3,500 


135.7 per cent. 
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The product of these two ratios is 1 . This arithmetic method, however, 
does not assign equal weight to the two ratios. Observe that the arith- 
metic method involv'cs the ratio of the arithmetic means (or totals), 
whereas fhe geometric procedure involves the geometric mean of the 
ratios. We have here two different concepts. Which one to use in a 
given situation depends upon the purpose. If we wish to establish a 
typical ratio for a number of couimunities and wish that ratio to be inde- 
pendent of the number of nai.ive-boni or foreign-born persons present in 
the various places (that is, we wish to assign equal weight to each ratio), 
wo may use the geometric mean of the ratios. If we wivsh to allow the 
populations to exert an Influence, we may determine the ratio of the totals 
or aritlimetic means, d'he (pie.stion is not vhetluT to use an arithmetic 
or a geonnjtric mean of the ratios, hut whether to use a ratio based on 
arithmetic moan.s (or (otals) or a gt'ometric mean of ratios. 

If thr ' ’’o r.atios of foreign-horn to native-born are averaged arithmetic- 
ally but weighted aecording to the native-born populations, the result is 
73.7 per cent. If the two ratios of native-born to foreign-born are aver- 
aged arithmetically but weigliled according to the foreign-born popula- 
tion, wc obhiiu 135.7 per cent These figures, of course, agree with those 
obtained I'y taking the ratios of the totals. 

'rhe geometric mean may he used when we wish to assign equal weight 
to eqiial ratios of change. Suppose (a) that two commodities are soiling 
at $2 and SIO per unit ■ fb) that at a later date the first, commodity doubles 
in price while the second one is halved in price, and thus they sell for $4 
and 15, respectively; and (e) that at a still later dr tc the original price of 
the first commodity is halved and becomes SI, while that of the second 
commodity is doubled and becomes $20. The arithmetic mean under 
these three situations yields: (a) $fl; (b) $4.50: and (c) $10.50. The geo- 
metric mean gives: (a) $4.47; (b) $4.47; and (c) $4.47. The assumption 
used to justify the geometric mean is illustrated by saying that, a doubling 
in pri<!e offsets a halving in price, a quadrupling in price offsets a price of 
one-fourth the original iigure, and similariy for any other two ratios whose 
product is 1 . This characteristic will be referred to again concerning a 
po.ssible use of the geometric mean in connection with price index numbers. 

(2) Sometimes a frecpicncy distribution is eneounten*d which is mark- 
edly skewed to the right. If, insteau of plotting the mid-values of the 
classes, we use the logarithms of the mid-values (or better, plot the loga- 
rithmic mid-values, the geometric mean of each pair of limits, on a loga- 
rithmic vY-scale) and a symmetrical distribution results, a geometric 
analysis may he proper. This is di.scussed more fully in Chapter 23. 

(3) Probably tiu> most freciuently used application of the geometric 
principle has to do with the determination of average per cent of change. 
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If a city had a population of 100,000 in a given year and 120,000 ten years 
later, what was the average annual per cent of change? The change was 
20 per cent over the entire period. If we take one-tenth of that figure, or 
2 per cent, as the annual per cent of increase and compute a 2 per cent 
increase each year over the preceding year, the second population figure 
turns out to be 121,900! Obviously the correct figure is slightly smaller 
than 2 per cent, since we are actually compounding. We may compute 
the average annual per cent of change by using 

Pn - Foil + r)\ 

where Po = population at beginning of period; 

Pn, “ population at end of period ; 
r — relative increase (or decrease) per year, expressed as a 
decimal ; 

71 = number of years. 

For the data above, 

120,000 - 100 , 000(1 + r )^^ 


Solving this by the \i.se of logarithms gives 

5.079181 - 5.000000 + 10 log (1 + r). 
0.079181 
10 

- 0.007918^ 


log (1 -f r) = 


1 + r = 1.0184, 

r = 1.84 per cent. 


The expression P,, — P^il + r)” is .sometimes termed the eompoiind 
interest formula because of its usefulness in various problems involving 
compound interest. We have used it above to determine average annual 
per cent of growth. * Knowing values of any ihrvfO of the four symbols 
shown, we can solve for the fourth. Thus we may determine; 


(a) Average annual per cent of change r, * 

(b) Population a given number of years later Pn, assuming a constant 

relative change. 

(c) Number of years n until a given population will he attained, again 

assuming a constant relative change. 


• In the above di.scussion we foiiiid the average p<}r cent of growth l>etween two 
selected points. Sometimes we wish to find the average per cent of growth which 
best describes a number of values for different years. Such an average is not depend- 
ent upon only the first and last values of a series and w therefore more likely to l>e 
a representative figure. A method of fitting a curve to obtain such an average is 
given in Chapter 13, 
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(d) Population a given number of years earlier, Pay if the per cent of 
change was constant. 

It should be noted that the assumption of a constant relative change for 
population is not valid over extended periods for any except possibly 
“new^’ countries. 

The harmonic mean. The harmonic mean H is the reciprocal of the 
arithmetic mean of the reciprocals of the values. The expression is 


1 I 1 

1 1 

.Ti ^ X2 X^ 


For piirposcs of computation, it is more convenient to use the form 


I 1 1 

“f" - — - -i- 

A'l A'j A'a 


~ ' -L ' 

1 A'l Aj A'.i 


• 4 - - 2 - 

A,v X 


The hannoiiie mean of the two values 3 and 12 is 


I 1 

I '3 12 

II “ 2 “ 

5 

24 ■ 

H = 4.8. 

• 

For these same values, the arithmetic mean is 7 5. while the geometric 
mean is \/3 X 12 = 6. For any series of values (not al! the same or not 
including zero as one value), the harn -nic mean is smaller than either the 
geometric or the arithmetic mean.* 

The harmonic mean is so rarely computed for a frequency distribution 
that we shall merely note the procedure, which consists of multiplying 
the reciprocal of each mid-value (or mid-value of the reciprocals of the 


• See Appendix 8, section 9.4. 
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class limits) by its frequency, adding these products, dividing by N, and 
taking the reciprocal of the result. 

While the harmonic mean is not a measure of great importance, it is 
often confusing and hence we shall give a somewhat extended explanation 
and indicate several possible applications. 

Application (I). Although oranges are not usually priced in this 
fashion, let us suppose that tw'o grades of oranges are selling at 10 for SI 
and 20 for $1. The arithmetic mean may be computed as 

« ^ 1 ^ + 20 ^ . 


That is, 1»5 for SI. or $0,007 per orange. This is the pri(*e we must p.'jy per 
orange if we spend equal amoxinis of inorxey for each grade. Paying $0,007 
for each of 30 oranges, we shall spend $2 00 for the lot. 

The harmonic mean gives a different result: 



2 



20 


10 

3 


- 13J. 


That is, 13i for $1, or $0,075 per orange. This is the pri(*e we must pay 
per orange if equal numhi rs of orangrs are bought at each price. Thus, if 
we 1/uy 15 orange.s at 10 for $1 'and 15 oranges at 20 for $1 , we shall sj)end 
$2.25 for all 30. Similarly, if we buy 30 oranges at $0,075 each, we shall 
spend $2.25 for the lot. 

The harmonic mean will give the same results as the arithmetic mean 
if we weight by the quantities bought at ea''*h pi u rims 



15 oranges per $1. or $0.0(>7 per orange, 


assuming equal amounts of money spent for each grade. 

If prices are quoted in the usual w^ay, as so much per dozen, these 
oranges are selling at $1.20 per dozen and $0.60 per dozen. The simple 
arithmetic mean is: 


X 


$1.20 + $0.60 

“"'2 * 


$0.90 per dozen, or $0,075 per orange. 


It is the same as the first harmonic mean, since we are assuming in our 
computation that equal quantities are to be bought at each price. (Iden- 
tical results are obtained if the quotations are per orange instead of per 
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dozen oranges.) On the other hand, if we consider that 10 oranges may 
be bought at $1.20 per dozen and 20 oranges may be bought at $0.60 per 
dozen, we have 


^ ($1.20 X 10) + ($0.60 X 20) 

" "30” 


$0.80 per dozen. 


or $0,067 per orange. 


This result is the same as obtained in our first and third ealiailations, since 
we hav^e assumed that equal amounts of money are to be spent for each 
grade of orange. 

In the above iliu.stratioris the harrnonie mean has furnishcfi no informa- 
tion not already availa!)le by use of the arithmetic re 'an Tlie harmonic 
mean may be useful, howm^er, when data are eustornarily or conveniently 
given in terms of problems solved per minute, miles covered per hour, 
umts purchased per dollar, and so forth. 

'The arithmetic m<‘an and the hannorii(; mean give consistent results if 
proper consideration is given to (a) how the data are (pioted and ib) what 
weights are to bo used Taking prij^es as an illustration, the table below 
sets forth ttie relat ionships. Expressions 1. 2, 3, 4 give results consistent 
with oacli (;ther. Similarly, oxpres.sions I, II, III, I\' give consistent 
results. • 


if prices are 
terms of : 


Price per unit 


: If tlie assumption is: 

qiHfUsi in | I'.qual amounts of money j Kqii.Hl number 
! spent for each ^rade oi com- j each 
, nioditx 

; 1. X, weighted by quanti- I 
j tie.s for equal amoimts i 
I of money (i/i this ease, | 

! unit.s per dollar) | 

j 2. //, weij^hted by dollars | 

j (or equally' 


of units of 
.\de or oommodily 
j bought '' each ])rK-e 

I. X, weighted by number 
of units lor f'qiinlly) 


II. 


H, weighted b\ dollars 
for equal numbers of 
units (or pnee per 
unit) 


Units per dollar 



X, weighted by dollars 
(or equally) 


lU. 


H, weighted by quant i- i 
ties for equal amounts i 
of money (in this case, j 
units per dollar) | 


IV. 


X, weiglited by dollars 
for equal numbers of 
units (or price per 
unit) 

//. weighted by number 
of units (or equally ) 


Consider commodi^v A as selling at 4 units for $1, or $0.25 each, and 
commodity B as selling at 10 units for $1, or $0.10 each. 
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If equal amounts of money are to be spent for each commodity; 
, - (0.25 X 4) + (0.10 X 10) 2.00 


= $0.1429 per unit, or 7 for $1. 


2. H = 


2 1.00 


, (±.\ 7 

\0.25/ Vo. 10 / 0.50 


3. .Y = 


(4X1) + (10 X 1) 14 


= $0.1429 per unit, or 7 for : 
7 for $1, or $0.1429 per unit. 


4. if = 


© -•(© 


= 7 for $1, or $0.1429 per unit. 


If equal numbers of units of each commodity are to be bought at each 
price: 


I. .Y = 


II. H = 


(0.25 X 1) + (0.10 X 1) 0.35 


. 0.35 




= $0,175 per unit, or 5.71 for $1. 


= $0,175 per unit, or 5.71 for $1. 


III. X 


TV, H 


(4 X 0.2.5) + (10 X 0.10) 2.00 


©■*+) 


2 ^ 8^0 
14 ~ 14 


0.35 

= 5.71 for $1, or $0,175 per unit. 
= 5.71 for $1, or $0,175 per unit. 


From what has just been said it may be observed that (for either 
assumption), when averaging fractions (ratios) by the arithmetic or har- 
monic method, we use the arithmetic mean if weights are in the same 
terms as the denominator, the harmonic mean if weights ate in the same 
terms as the numerator. Of course, ’f weights are in the same terms as 
the numerator, they may be converted into terms of the denominator and 
the arithmetic mean employed. 
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Suppose that a transaction consists of 40 handkerchiefs sold at 10 for 
$1 and 60 handkerchiefs sold at 20 for $1. Now we are not interested in 
either of the assumptions mentioned above. What we desire is the mean 
price when 40 handkerchiefs sell at 10 for $1 and 60 sell at 20 for |1. 
Using the quotations as given (that is, in terms of number of units per 
dollar), we may use the harmonic mean with quantity weights. Thus 



14^^ per $1, or $0.07 each. 


Still using the quotations in terms of units per dollar, we may obtain the 
same result by employing the arithmetic mean, if out ^^^eights are amounts 
of money spent for each grade. Thus 


^ Uu A 1) -f- (20 X 3) 100 

x = 


14^^ per $1, or $0.07 each. 


If we shift our quotations to price per unit, we have 40 handkerchiefs sold 
at $0.10 each and 00 sold at $0.05 each. Now, using the harmonic mean, 
we weight by amounts of money spent for each grade. Thus 



_7__ 

6.10 


= $0.07 each, or 14^ per $1. 


Finally, using the arithmetic mean of prices per un-t and weighting by 
(luantities sold, we have 


V ^ ±_(? 9^2^ _ L 

loo ”■ 100 


$0.07 each, or 14f per $1. 


Application (2). Occasionally a frequency distribution may be encoun- 
tered which is so skewed to the right that, when plotted in terms of the 
reciprocals of the class mid-values, it assumes an approximately normal 
form. In such instances harmonic treatment may be indicated. Such 
cases are rather unusual, however, and will not be treated in this book. 

Application (5). An interesting ai, ' apparently valid application of 
the harmonic mean is given in an article by Holbrook Working.^^ In his 
study of the factors influencing the price of potatoes, Working uses the 
harmonic mean, bccau.se, as he points out, a low price during part of a sea- 


Holbrook Working, Faiiors Detennining the Price of Potatoes in St. Paul and 
Minneapolis, Technical Bulletin 10, University of Minnesota Agricultural Experiment 
Station, pp. 9 and 10. 
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son will be compensated only by a disproportionally high price during 
the remainder of the season. To illustrate, we have selected the monthly 
prices for one crop year and have shown them in Chart 9.5. When the 
reciprocals or the logarithms are plotted, the curve is stiaighter than 
when the arithmetic values are plotted, the reciprocals giving perhaps the 
most nearly straight line. This indicates that the harmonic mean is not 


^lllCC IN CCNT4 
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(^hart 9.5. Price of Potatoes per 
Bushel in Minneapolis and St. 
Paul, September 1919 Ma> 1920: 
A. Price, B. l.4>|K;artthni of Price, 
C, Kceiprocol of Price, Dutu from 
Holbrook Working:, ibid., |». M). 

rise of 50 per rent, and a fall of 
cent. Thus 


inappropriate as a measure of the 
average price of potatoes during a 
season. 

It is sometimes argued that the 
gf^ametrie mean should be u.^^ed for 
series of data having a definite lower 
limit and an iridefmite upper limit. 
One type of .such data is price rcla- 
tive.s, which, having a base of 100, may 
fall to 0 but rise to oc . The que.stion 
is not sc much one of the existence of 
such limits as it is one of what values 
may actually occur and how the limits 
are approached- -firithmelically, geo- 
metri<*ally, or reciprocally -whether, 
if we are dealing with a froijiieney 
distribution, th(‘ sorif?s is approxi- 
mately symnuUriciil in terms ef .V, 
skewed but approxirnati'ly symnnUri- 
cal in terms of log .V, or skewed but 

appro.viinaTcly normal in terms of - 

In an arithmetic sense, a price drop 
of per cent is offset by a price rise 
of -i-i 3 per (-('nr (of the original base), 
a deeline of 50 per cent is offset by,a 
lO per eenf is offset by a rise of 1)0 per 


06.7 + 

.... ....^ 


100 . 


50 + 150 

•> 


=-- 100, 


10 + 100 
2 


100 . 
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In a geometric sense, a price drop of 33.3 per cent is offset by a rise of 
50 per cent (of the original base), a fall of 50 per cent is offset by a rise of 
100 per cent, and a drop of 90 per cent is offset by a rise of 900 per cent. 
Thus 

V66.7 X 1^ = 100, 

VbOX^ = 100 , 

V^IO X 1000 = 100. 


In a reciprocal sense, a price drop of 33.3 per cent is offset by a rise of 
100 per cent (of the original base), a fall of 50 per cent is offset by a 
rise to 00 , and a fall of more than 50 per cent cannot be offset by any rise 
however great. Thus 


2 

IT jL 

66.V 200 


= 100 , 


2 



2 

00 


100 . 


There are a number of other measures of centra! tendency which are 
of mathematical and theoretical rather than of practical interest. One of 
these is the quadratic mean: 



This is the square root of the arithmetic mean of the squares of the values. 
Unless all the values are the same, the quadratic mean exceeds the arith- 
metic mean. The quadratic mean is mentioned here because the concept 
is important. Although we do not use the term ‘^quadratic ” or ‘'mean,'' 
we shall shortly compute the quadratic mean of the devialions from the 
arithmetic mean. It will not be a measure of central tendency, but a 
measure of dispersion; we shall call it the standard deviation, or s. and 
its exprevssion is 


s = 




Symbols I'sed in Chapter 10 


AI): the average (or mean) deviation. 

CX3I lower-ease (^ireek alpha, a measure of skewness using tlie third powers 
of the X values. For ai and a2, «ee footnote U) 

CV4 - lower-case Greek alpha, a measure of kuriosis using the fourth powers 
of the X values. 

f?’. ; lower-case Greek beta, a measure ol skfwvness using the third powers 
of the X values. 

32 *. lower-case Greek beta, a measure of kurtosis using the fourtI\ powers 
of the X values. 

d: deviation of an A' value from Xd. 

d': deviation, in terms of class intervals, of a?) A value from A^. 

/; a frequency. 

a measure of uniformity, .the reciprocal of 2 a'". 
i: the class interval. 

M: used with to indicate a vspocified multiple of s. 

Med: the median. 

>Mo: the mode. 

M2, M4- row’er-caso Greek mu; respectively, the first, second, third, 

and fourth moments about X, with Sheppard’s corrections, 

TTi — 0 and /ji:i ~ 7r.,, 

A": the number of items in a sample 

i'2j ^3, lower-case Grr^k nu , rospcetivoly. the first, second, third, and 
fourth moment.s al>out Xd 
Pi, P2, ‘ , Po<j'> the percent iic.s. 

TTi, TTj. TT;, 7r4 : lower-(‘a t In’ek pi , nrspecti vely, the first, second, third, and 
fourth moments al)Oiit A\ tti = 0 . 

Q: the scnii-interquartile range. 

Q\y Q'i' quartile.s. Q2 =~ Med, 
s: tie* j-lMiidard deviation of a sample, 
s': the variance of a sample. 

tlie ^tfuidard <levjalion of a sample, with vSheppard’s coi ruction 
Sk: llie Pearsofiian measure of skevvne.ss, 

Sk<^: a incij^ure of skewne.ss !-.as(ul on the qiiartiies. 

a: }owi?r-ca.-‘' Greek sigma, ‘'sigma <*aret” or ‘‘sigma hat,’’ e.stimate of the 
stamlard <le\iati<a! of a population. 

<r: lower-case (jreek sigma, the stfoidard deviation of a population. 

210 



Chap. 10] 


SYMBOLS USED IN CHA1>TER 10 


211 


S: upper-case Greek sigma, meaning ‘'take the sura 

V: the coefficient of variation. 

x: deviation of X from X. 

X: a value in a series; also, the mid-value of a class in a frefiueney dis- 
tribution. 

X: the arithmetic mean. In later chapters we shall distinguish between 
the arithmetic mean of a sample, X, and the arithmetic mean of the 
population, Xp. 

Xd'* a designated mean. 

1 ]: disregard signs; thus, Sir! means ‘‘take the sum of the x values 

without regard to signs/' 



CHAPTER 10 




Dispersion, Skewness, and Kurtosis 


In the preceding chapter we considered certain measures which at- 
tempted to describe the central tendency of a frequency distribution. 

I'here are other aspects of frequency 
distributions which are also impor- 
tant. First wc shall consider the dis- 
persion, or spread of the data. Two 
counties may ea(‘h show an average 
yield of wheat of 15 bushels to the 
aciTj; but, if the data are considered 
farm by farm, one county may exhibit 
extreme values ranging from 10 to 20 
bushels per acre, while the other may 
show yields as low as 5 bushels per 
acre and as high as 25 bushels per 
acre. If such a crude measure of dispersion may be used, it is apparent 

that there is greater uniformity of yield in the iirst county. Chart 10. J. 

A A 




Curves Having; DifTcront Disper* 
sions. 


Chart 10.2. A Curve Skewed to the Right (Solid 
Line) and a Symmetrical Curve (Broken Lino). 

shows two symmetrical curves which have the same mean but which differ 
in respect to dispersion. 

If a frequency curve or frequency distribution is not symmetrical, it is 
said to be skewed, or asymmetrical. Most frequency distributions exhibit 
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more or less skewness. Chart 10.2 shows two curves, one of which is 
symmetrical and one of which is skewed. The skewed curve is skewed to 
the right — the direction in which the excess tail appears. 

Curves of frequency distributions may be symmetrical but may differ 
from each other in regard to the amount of kurtosis present. The basis of 
reference is the normal or mesokurtic curve discussed in Chapter 23. A 
leptokurtic curve has a narrow^er cen- 
tral portion and higher tails than does 
the normal curve. A comparison of 
these two is shown in Chart 10.3. 

Chart 10.4 shows a platykurtic curve 
and a normal curve. As may be seen, 
the platykurtic curve has a broader 
central portion and lower tails. 

MEASURES OF ABSOLUTE 
DISPERSION 



Curve (Solid Line) and a Normal 
or Mesokurtie CZurve (Broken 
Line). 


rhe moan annual lemporaturc at 
Lexington, Kentucky is 55.2 degrees. 

The mean annual temperature at San 
Francisco, California is 55.7 degrees, wliich is very little different from the 
temperature at Lexington. These two figures do not, however, suffice to 
characterize this aspect of the climatic conditions of the two cities. The 
temperature at Lexington has been k»^cwn to fall low as —20 degrees 

and to rise as higii as 108 degrees. In 
San Francisco the lowest recorded 
temperature i» 20 degrees and the 
highest is 104 degrees. It is quite 
apparent that there is greater vari- 
ability of temperature at Lexington 
than at San Francisco. 

Let us consider a second illustration. 
A buyer for a large department store 
has beeji offered two types of electric 
ligl s for use in the store. The sales- 
men each claim about the same aver- 
age length of life for their bulbs. The 
buyer obtains from a testing laboratory test data for 40-watt lamps of the 
two makes and finds that the average life of each of the two kinds of bulbs 
is about 1,000 hours. Examining the data further, however, shows that 
in one batch of bulbs a lamp burned out at 325 hours while one lasted 
1 .570 hours. In the other batch one lamp lasted but 105 hours, while one 



Chart 10.4. \ rialykurlic Curve 

(Solid Line) and a Normal or 
IVlcAokurtir Curve (Broken Line). 




214 


DISPERSION, SKEWNESS, AND KURTOSIS [Chap. lO 


did not burn out until the expiration of 2,910 hours. This limited infor- 
mation indicates a greater degree of uniformity among lamps of the first 
batch. 

The range. The measurement of dispersion may be made in a crude 
form by referring to tlie lowest and the highcvst values, as was done in the 
preceding paragraphs, I'his is a very simple and easy-to-understand 
measure. The range gives a comprehensive value for the data in that it 
includes tfie limits witliin which all of the itons occurred. However, the 
range has certain disadvantages. It fails to give any consideration to the 
arrangement of the values between the two extreme values.^ Further- 
more, the range is misleading if either of the extreme values is an unusual 
occurrence. 

Referring to tlie cadet rnidshipmonV grade.s in Table 10. .‘b it is obst'rved 
that the range is 71.93 (the lower limit of the first class) to 89.93 (tho 
upper limit of tho last class). If we have the array to refer to, as in 
Table 8.2, the range may he given a little more' accurately as 72. 1 (o 89.0. 
The range from the frequency distribution inert'Iy tells us that tjo one 
in the class received a grade below 71.95 or ai>ove 89 93. The range is 
usually stated as tin* ditiorencv between the two extreme values. For the 
<'adet-midshipmen, 89 93 - 71.93 ~ 18 00. However, if only this single 
figure is given, we do not know whether the range is from 0 to 18, or from 
78 to 96, or what the limit.^s may be. 

The 10“90 percentile range. Sometimes w(‘ are interested in know- 
ing the range Vithin whieh a certain proportion of the items fall. One 
su(‘h range, which is occasionally u.sed in educiAir)n{d measiinunent, is the 
U>-90 percentiio range. This measure exclud(‘s lowest 10 per cent 
-and the highest 10 per cent, giving the, two values hedween which the 
central 80 per cent of the items ot^cur. Of course, the lOth pen'entile i.i 
the 1st decile, and the 90th percentile is Uie 9th decile. The measure is 
usually referred to, however, as the 10 90 percentiio range, rather than 
the 1 9 (lecile range, since the former carries more clearly (he idea of the 
central 80 piM' cent, 

d'he 10 90 percentile range is not affected by extreme vahiejs as is the 
range. However, this measure has a very s(‘rious short r-oming in that it 
(hxxs not make use of the value.s of all th(i items. As a result, tlie values 
below (h(^ lOtli percentile (or above the 90th pr*r(;(!ntilc) (‘ould be massed 
closcdy t<jgether or spread out widely ; the ctfect upon the 10 90 percentiio 
range would be the .same. Also, the valutas between the lOth percentiio 
and the 90th percentile could be arrange<i in any concauvablo manner so 
long as th#:y are r.omew!)eie between the lOtli arul 90th percentiles. 

* It must be (jhviouH ilutt wlifUi V » 2. this difficulty (loos not exist. It is <if minor 
Jinj»ortrtncc for anifill =jainplc.s drawn from a normal population. 
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The qiiartile deviation. In Chapter 9 mention was made of Qy 
and Qa, the lower and the upper ([uartilos. A measure of disper.sion 
based upon these values is termed the qnartile deviation ^ or the semi-inter- 
quartije range. It is given by . 

(h-Qi 
Q - 2 ‘ 

Tf a series is symmetrical, it is (tlear that Qi and Q:^ are equidistant from 
the median. Therefore, if we measure ±Q from the nu'dian, we include 
50 per cent of the items of the series, for we iia\e nu'asun'd back to Qi 
and Q^. If a series is skeAved, as is usually true, w(i may take ± Q around 
the median, and, Avhile we shall not arri\e at r'lher or Q:^ Ave may 
expect to include approximately .)0 per cent of the items unless the ske\v- 
ness is great. 

Tho ijuartiic deviation, like the 10 -90 perciuitile range, is not afTeeted 
by extreme values and also fails to consider the values of ail the items. 

The uAcrage deviation. The av^-^rage deviation, or the in* an deviatidn^ 
as it is sometimes (tailed, is usually measured in relation to the arithmetic 
mean. The a^'erage deviation is obtained by taking Hie <um of the devi- 
ations of the items from the arithmetic* mean, without regard to signs, 
and dividing by the number of items, ll will be recailled that So: ~ 0 
and it is for this reason Ihut the signs of the ^^‘irlons .r \:dnc*s are neg- 
lected. Thus, 



or, for a frequency dist libution, 



where | j means that the signs an' neglected. Rccause the .‘^um of the 
deviations (signs neglectcal) is a minimum when tak^m around the median, 
the mean deviation is soinetimcs computed in relation to the median. 
In practice, however, the mean is generally u^ed and, if the series is sym- 
metrical, the resulting AD is the same. Since AD is of limited useful- 
ness compared to the measure of <lis» '*rsion next discus^.cd, the computa- 
tion of AD is not sliOAvn here. The determination of AD for a frequency 
distribution is illustrated in the first edition of this book on pages 23() 
and 230, 

If a distribution is normal, 57.5 per cent of the items ,nre included 
within the range of J? ± AD. If the distribution is moderately .skewed, 
this will be found ‘O be approximately true. 

The standard deviation^ ungroiiped data. Instead of merely 
neglecting the signs of the deviations from the arithmetic mean, we may 
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square the deviations, thereby making all of them positive, 
may have a measure 


n' 


Thus, we 


the variance or mean square deviation. (At a later point we shall use the 
term variation to refer to Sx'.) is also known as the second moment, 
Ta, of the distribution, since the deviations have been raised to the second 
power. We shall make use of the variance in later sections of the book. 

At this point we are interested in the square root of this measure, 


a 




2^ 

"n ’ 

TABLE lO.l 


Cf^mpuiatlon of Standard Devifition for 
Scores of 15 Persons in Recalling Trade 
Mames of Advertised Products 
by Vse of the Expression 

* =■ \ -.V 


Si lb j CP t 

Score A" 

! ^ 

J* 

1 i 

12 

-20 87 j 

435 56 

2 i 

21 

-11 87 

! 140 90 

3 

'21 i 

-11.8" 

140 90 

4 

2.1 

* - 9 87 

97 42 

6 

27 

. - 5 87 

34 46 

6 

28 

1 - 4 87 

23.72 

7 

30 

' - 2 87 

8 24 

8 

31 

1 1,13 

1 28 

9 

37 

I 4 13 

17.06 

10 

39 

1 6 13 

37 58 

11 

39 

(5.13 

! 37 58 

12 

j 39 

6 13 

37 58 

13 

40 

7.13 

1 50 84 

M 

49 

10 13 

260 18 

15 

54 

21 13 

j 416.48 

Total 

4(a '! 

! 

! I.7(»9 78 


pHta fioir, S M Newhall and M. H. Ifidm, ‘‘Mvniory 
Val(U' of \h»oJofc in Maga/iiie Advert ining," Joutnal 
of A Pithed VftycKolngy, Vol. 13, 1920, pp. 62-7.'). The 
abovo da*.tt vverc for .odvertwementH of 150 square inches 
each, and h was ob.»«‘rved for 5 seronds. The maximum 
ptw-oiblc ?Lorf} was 81. 




49.1 

“fs 


- :?2.87. 


' a" 


,/iJ69T78 


15 


Vi 17.58 • lo.a 
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which is termed the standard deviation or, occasionally, the root-mean- 
square deviation, [t has been pointed out previously that is a mini- 
mum when taken around the arithmetic meaii.^ Therefore, the standard 
deviation is always computed in reference to the arithmetic mean. As 
the above expression indicates, the steps involved in computing s arc: 

(1) Determine the deviation x of each item from X; 

(2) Square these deviations; 

(3) Total them; 

(4) Divide thivS sum by !Si\ 

(5) Take the s(piare root. 

The computation of 5 for a series of ungrouped data is shown in Table 
10.1. This procedure involves the cornputalion of x for every item, and 
would be a rather laborious pro(‘edure if there were an appreciably larger 
number of items. I'he value of may he obtained, without computing 
each by means of the expres.sion^ 



'The compulation of s by this shorter method is illustrated in 4"able 

10.2, Notice that the correction I J is subtracted. This is always 

true, 'rhe sum of the squared deviations is least when taken aroniid X. 
We, however, took our deviations around some other value (0, in this 
instance), and these squared deviations are thereto. e too large. 

Referring to Table 10. 1, it will be observed that the value of X was 
rounded to two decimals, and Urns each value x and xx is an approxi- 
mation If X and :r are shown to sufTicient digit.^, results by the two 
methods will be the same. Here, both methods yield 10.0, 

At this point it may be well to note that s measures the dispersion in 
the sample. In Chapter 24 we shall discuss a, th^ population standard 
deviation, and S’, an estimate of the population standard deviation based 
upon a sample. 

The standard deviation, grouped data. Before considering the 
properties of s, let us see how to co’^'inite 6- for a frequency distribution. 
Since frequencies are present, 



* For a demonsirfl^ ion, see Appendix S, section 10. 1. 

* For proof of this expression, see Appendix S. section 10.2 
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where x now represents the deviation of a class mid-value from the mean. 
Table 10.3 illustrates the computation of s for the cadet-midshipmeii^s 
grades. It is fairly o])vious that this method, involving the determina- 
tion of a number of x values, is cambersi.)me. 

TVBI.K 10.2 

Compatxifion «/ Standard Devia^ 
fion for Scores of 15 Persons in 
Recalling Trade AVimPs of 
Advertised Products hy 
L'se of the Kxpres- 
sion 



Subject 

! Sf'oro .V 

A-^ 

1 

1 12 

111 

2 

! 

441 

a 

2J 

Itl 

4 

21 

520 

5 

27 

729 

6 

2S 

7St 

1 

30 

U(H) 

8 

I 

1,15U 

1» 

; :57 

1/m 

10 

au 

1 .521 

11 

au ; 

; 1,521 

12 

i ' i 

j 1 521 

i:: 

! 40 

1 , bOO 

1 1 

40 

' 2.401 

]:> 


2 !M0 

Total 

i 4UX 

'l7.U7U 

nuts ir-mi 

•^arnf- ' *' n't 

Oiblo if) L 

* 


iT.ura / 

■\ V-- 

( v ) ' N 

1 .5 \ 


^ \ !.l‘»S'20 - l,UH0.22 - \' 117 Us 
- lO.U 


A short nK*t.hod for s is available which a]lf)\vs us to take th(^ mid- value 
of any class as the' assumed mean, work with d(‘via{ions around thiii value, 
and make tlie ticc^^ssary correction, 'fhe expression is 



I’o further shorten the process, the deviations arc taken in terms of 
classes, givin^^ 

^ For demonstration, see Appendix 8, seetkn 10.2. 
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s = 



where d' indiratcJS the deviation of a ('lass mid-value from the assumed 
mean in terms of (dasses and i is the class interval. It is of interest to 


note that the correction factor 



is the s(]uare of the correction fac- 


tor used in computing!: the arithmetic mean by the short method. The 
computation of .s* by this shorter procedure is shown in 'J'alde 10.4. 


TABf.K 10.3 

^ Utrnpulnfion of tfio Stamlanl Dfvmtion for Graii^s of the 1952 Graduatinf/ 
Gloss of the United States Aferchant Marine Academy by Use of the 

Expression 





s -- 

\ v- 



tlnule 


iViiDiUer of 

- 

nnilshiptuon 

/ 

MiO- viihios 
of 

X 

■r =•- A' - X 

X* 


72 n 73 

0 

7 

72 05 

— () (>() 

44 3550 

310 4802 

7-t 0- 75 

0 

31 

7 \ 05 

-4 iKi 

21 7156 

673 1836 

7C «)*;7 

0 i 

42 

7G 05 

1 -2 m 

7 0756 

297.1752 

7H U-7U 

0 j 

1 w 

7S 05 

-0 60 

0.4356 

23 5224 

SO 0 SI 

1 

1 -‘53 ! 

80 05 

+ \ 34 

1 7956 

59 254S 

S2 0 -S3 

0 

21 

82 05 

+3 34 ! 

11 1556 

267.7344 

SI 0 S5 

0 

22 

8 A 95 

+5 34 ! 

?s 5156 

627 3432 

Sii 0 S7 

0 j 

i 8 

8(» 95 

-f* 7 34 

; .V 8756 

431 0048 

v'^s n so 

0 * 

1 4 

88 95 

-fO 34 

! <7 2356 

348 9424 

Tctii! 


1"' ' 225 ' '1 


i 

j 

1 

' '37038"6500' 




\ 




,v 

X - 7" r.i. 


(> 5(10 

225 


~ \ li>.5()5l *- 3.0)7. 


, Properties of the standard lUnialion. Of the various measures of 
absolute (iisptusion which have heen nmntioned, the standard deviation 
(and its scpiare, the variance) is by far the jno^t important. It will be 
used in (x>nne(‘tioji with N^arious statistical methods described hereafter. 
One important consideration is thi it is one of the factors involved in 
the equation for the normal curve and for various skewed curves, dis- 
cussed in Chapter 23. It is also used in testing the reliability of certain 
statistical measures, in correlation, and in connection with business cycle 
■analysis. 

The standard deviation is the most frequently used measure of the 
spread of a series of data. If ±s is measured from the arithmetic mean 
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of a normal distribution, 68.27 per cent of the items are included; within 
the range of X ± 2^, 95.45 per cent are included; and within X ± 3s, 
99.73 per cent,'* or nearly all, of the items are included. Chart 10.5 illus- 
trates what has just been said. The percentages just given refer to a 
normal curve. If the distribution is skewed, these percentages will be 
only approximately realized. For the cadet-midshipmen grades (Table 
10.4), X ± s is 79.61 ± 3.67 = 75.94 and 83.28. To ascertain the pro- 
portion of cadet-mid.-^hipnien in Table 10.4 who fall between 75.94 and 
83,28, we first determiiie the riumber occurring between 75.94 and 75.95 

TABLK lO.i 



(the upper limit of the second class), w^hich is 0.2; then we include all of 
the frecpiencies in the rie,<t three classes, after which we compute the 
number l>etweon 81.95 fthe low'er limit of the sixth class) and 83.28, 
which is 16.0. The total is 145.2, or 64.5 per cent. Within ± 2$ 
(that is, from 72.27 to 86.95), we find 215.9, or 96.0 per cent of the grades. 
Within X ± 3a (68.60 to 90.62), all of the 225 grades are included. 


‘ See Appendix K, which gives the areas in one-half of the central portion of the 
normal cui^*e. More exactly, 68.27 is twice 34.13447; 95.46 ia twice 47.72499; 99.73 
fa twice 49.86501. 
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di»rt iC,5. Proper lion of Items within i: Is, ±2s, and ±3»ofthe 

Arithiii<‘lie Mean in a Normal Curve. 

In doalinj^ with tlie Tiormal curve in later chapters, we shall not con- 
fine ourselvf": to the proportionate areas included \VTthin ±s, ±2s, and 
±3s of the mean, but sluiil conside any desired multiples of s. For 
example, we shall later be interested in know'ing that 95 per cent of the 
items may bo found within /V ± i.ytw and that 99 per cent may occur 
wichin X ± 2.586*, Actually, we shall be more interested in the propor- 
tions occurring Ih'j/ond the limits mentioned, that is, 5 per cent and 1 per 
cent. 


Before k^aving the topic of itbsoliite dis|)ei'sion, it niay be of intere.st to 
point <nit that, for any series of valuer, no matter how they are dis- 
tributed, it may be sliowii by TchebychclT’s inei|uality, that the propor- 
tion of th.e values iyin^ within tlie limits of Y ± Ms (where the value of 

M is greater than one) will be more than 1 — 

portion faliing bevond tlu* limits of ± Ms will be less thah-^- If a 

distribution is iimmodal, and if the diiTeronr-* b^'tween the mode iind 
the mean does not (‘xeeed s, the Cam])-Meidell inequality slates that 

more than 1 ---.v of the vuluv . are within A' ± il/s and that less 

2.2i^M^ 

than — of the values he beyond X ± Ms. 

2.25;I/‘ 

The grt‘ater the dispersion of a senes, the greater the value of $. As 
a measure of iimfonmty of the charactenst4C measured, the smaller the 
value of 6, tlv greater the iiiuformity. To avoid this inverse relation- 
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ship, a modification referred to as a measure of precision is sometimes 
used, especially with reference to the precision of a series of physical 
measurements. This meiisure is 


It is not often used in statistical work in the social sciences. 

]\IEASURES OF RELATIVE DISPERSION 

In the preceding paragraphs we have discussed measures of ahsohite 
dispersion, all of which are expresfie<l in terms of the units of the problem, 
which may be dollars, pounds, inches, percentages, and so i’orth. When 
we wish to compare the dispersions of two or more senes, it may or may 
not be desirable to use such a measure. The comparison of disj)ersions 
of two or more series resolves itself into three possible situations: 

(1) The series to be compared may be expressed in the same units, 
and the means may be the same, or nearly the same, in size. Idle grades 
of the cadet-midshipmen showed a mean of 79,01 and a st.andard devia- 
tion of 3.67. If another graduating class showed A’ - 79.55 and s = 
3.50, it is clear that the second cla.ss would exhibit less dispersion. 

(2) The series to be compared may be expressed in the same units, 
but the arithmetic means may dilTer. Some years ago the (.loo«!year Tiro 
and Rubber Company developed a new Cvpe of cord for autonuaule tires 
which w^as designated ''Supertwi.st.'' The Supertwist cord was superior 
to ordinary cord i!i that it could stretch more and had a longer flex life. 
Tests made on cord' as receive<l from the cotton mill and prior to fabrica- 
tion into tires showed for the flex life of Supertwist cord 

X - 138.64 minutes, and 5 ~ 15.27 minutes; 
while for regular cord the figures were 

X ~ 87.1)6 minutes, and s ~ 14.12 minutes. 

If we compare the two s value^, it appears that Supertwi.’-t coni is more 
variable in respect to flex life than is regular cord. However, it must tai 
noted that the average flex life of vSupertwist is much gn‘ater than that of 
regular cord. Taking this factor into consideration, wc may set up a 
me^ure of relative dispersion^ 


V - 


a 

X 


This is the coefficient of variation and is usually expressed as a percent- 
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15.27 

V ^ 0.1 101, or 11.0 per cent: 

138.64 

while for regular cord 

14.12 

F — - — = 0.1611, or 16.1 per cent. 

87.66 

It is thus apf)arent that the relative variation in flex life is much less for 
Supertwist cord than for regular <‘ord. 

Chart 10.6 also illustrates the comparison of dispersions of two series 
having dilTerent mean values. Section A shows the curves of two dis- 
tributions having the same absolute dispersion- but different relative 
dispersions. In section B are curve.s of two distributions having quite 
diffor<^v . >K«olute dispersions but the same relative dispersions. If the 
zero is shown on the horizontal scale, as in Chart 10.6, a very rough vis- 
ual impression may be had of the relative dispersion of a series. For this 
reason some statisticians think it is desirable to show the zero on the hor- 
izontal scale, Tliis does not seem to be a very important matter, how- 
ever, since relative dispersion can at best be visualized only approximately. 
Occasionally frequency distributions are formed with class intervals 
expressed, not ni tt»rms of original units, but as perc^entages of the mean, 
the interval being some convenient figure, such as 10 per (*ent of the 
mean. If two such distributions are plotted on one chart, it is easy to 
compare visually their relative' dispci.-iuns. 

(3) The series to be comjxired may be expri'ssod . i different units. In 
suc.h a case the standard deviations cannot be dire(‘tly compared. A 
study of a large number of male industrial worku'rs^ revealed an average 
pulse rate of 81.1 beats per minute arid a standard deviation of about 12 2 
beats per minute. Measurements of height showed X ^ 66.9 inches and 
s --■= 2.7 iiK'hes. The measurements of height included a small number of 
men not measured as to pulse rate Let us disregard this difficulty for 
the purposes of our illu.^tratiou. Are the iiubistrial w orkers more variable 
in respe(‘t to pul.s(‘ ratt‘ or lioight'* It is obv*uus that the two standard 
deviations, being in different units, <*annot be compared Computing the 
two coefficients of variation shows, k pulse rate, 

12 2 

V — = 0.149, or 14.9 per cent, 

81.1 ’ 

* Based on data in A Uenlth Stndtf of Ten TJwusand Male Industrial Workers, pp. 46 
and 59. United St:d' lloalth Sorvicc Publio Health Bulletin, 1(52, 
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and, for height, 

2.7 

y - ---- - 0.040, or 4.0 per cent. 
o().9 

It is clear that, for this group of men, pulse rate is subject to greater dis- 
persion than is height. 


FneoucNcics 



0 50 100 150 

X VALUES 
B 


Chart 10.6. Coitipariftons of DispersionB of Series Having Different 
Arithmetic Meiins. A. Same ahfiolute clispermon, different relative dia- 
pe.rsion: left-hand ciirve, J? = 33, s « 10, V 30.3 per cent; right-hand 
curve, % — lUl, A* — 10, V »= 0.9 per cent. B. UifTerent ab.sohitfMJi.sper- 
ston, same relative* dispersion: li*ft-hand curve, X ~ 50, « « 5, V' 10 per 
cent: right-hand curvf% = 100, s =* 10, T « 10j)ercent. (Section.s A and 
H have different vertical seule.s .since they are not intended to be compared. 
However, if the vertical scale of .section B is expanded 50 per cent, all (Mirves 
will have the sariif area > 

Sornfitvhat akin to our incasun^ment of relative dispersion is the possi- 
bility of expressing a given value in terms of its divergence from the mean 
and also in terms of the dispersion of the series. Such a procedure is not 
especially useful when we arc eon.sidering only one value or comparing 
two values from the same series. Its usefulness becomes apparent when 
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we want to compare two values from different series and when those two 
series (1) differ in respect to J? or s, or both, or (2) are expressed in differ- 
ent units. Suppose that a certain student has made a grade of 180 on 
an intelligence test, and that his group showed X = 160 and s = 15, 
This same student made a grade of 86 in history, and the group showed 
X = 70 and s •- 12. We are interested in knowing whether his relative 
standing is higher in the intelligence test or in history. In the intelligence 
test he was 20 points above the mean, and in history he was 16 points 
above the mean. These deviations, however, are not comparable, but 
may be rendered so by dividing by their respective standard deviations. 
Thus, 


Intelligence test: 




X 


s 


History: 



180 160 
15 

Si) - 70 
' 12 ' 


+ 20 


+ 1.33: 


'12’ 


+ 1.33. 


It is apparent that the student shows the same relative standing in history 
and on the intelligence test, being +1 .33s above the mean in each. The 
usefulness of this device is by no tneans limited to the educational field. 
It is, however, often used with test d^ita and is then referred to as a 
“standard score. 


SKEWNESS 

When a series is not symmetrical, it is said to be asymmetrical or 
skewed. In Chart 10.2 a skewed curve was show'^ in relation to a s^'^m- 
rnetrical one. The curve of cadet-midshipmen\s g^,« les (Chart 10.7) is 
also skewed. Measures of skewness indicate not only the amount of 
skewness but al.so the direciion. A series is said to be skewed in the 
direction of the extreme values, or, speaking in terms of the curve, in 
the direction of the excess tail. Thus the tw^o curves referred to above 
are both skenved positively, or to the right. Most skewed curves encoun^ 
tered in the social sciences are skewed to the right Onl}" rarely do W’e 
find curves skewed to the left, as in Chart 10.8, and even more rarely do 
we find data characf<*nstically skewed to the left 

Many series, however, are characteristically skewed to the right. 
Examples are frequency distributioii. of wages or salaues, use of elec- 
tricity (see Chart 23.13). weights of adult male human beings, and 
numerous ether variables. Distributions of grades are apt to be mod- 
erately skewed to the right, or nearly symmetrical. In the case of the 
cadet-midshipmen's grades, the skewness is partly due to the fact that 
we are considering only those men who had survived the previous three 
years, during w^hici/ some of the less able had been dropped. The dis- 
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NUMBER OF 
CADET-MIDSHIPMEN 



Chart 10. T. l>ocatioii of Arithmetic Mean, Me4iiaii<) and Mode fur (prudes 
of the 1952 Graduating ClaAn of the United Stale8 Merchant Marine 
Academ> . 


NUMBER OF 
inventors * 



Chart 10.«. Age at Death of 371 American Inventors. Data 
from J3io*-Sorial Characteristics of American Inventors,” by Sanford Win- 
ston, American Sociological Review, Vol. 2, No. 6, pp. 8»'I7 ~849, 
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tribution of ages at death of the American inventors in C'liarl 10.8 may 
be characteristically skewed to the left, since younger men do not often 
have enough inventions to their credit to be (?lassified as ** inventors,'' or 
the skewness may be due to the fact that a time factor is present almi)st 
one-fifth of the inventors included in this study were l)orn before 1800. 

Pcarsonian measure of .skewnes.s. It was pointed out in the prr*- 
coding chapter that the mode is not influenced by the presence of extreme 
values, the median is influenced by their position only, and the arithmetic 
mean is influenced by the size of the extremes. C'oiisequently we could 
make use of the mode and the mean to measure .skewness. We might 
say, then, that skewness = mean — mode. But there are some short- 
comifigs of such a measure. In the first place, being a measure of abso- 
lute skewness, it would be in terms of the units of the prol^hun. Fur- 
thermore, it would have much different meaning for a sorie.s of small 
di.spersion than for a widely disper.sed series. Htatisticians almost never 
use a measure of absolute skewness, pieferring a measure of relative 
skewness. 'Vhe measun^ just mentioned may he put into relative terms 
and the two difficulties ovcircome by dividing by ,s. Now 

fekewness = 

s 


This gives us a relative measure with povsitivc sign when skewness is to 
the right, and with negative sign when skewness is to the left. There is, 
however, another important difficulty growing out of the fact that the 
mode for most frcqueiu'v distributioi.ii Is only au proximation. The 
median may be more satisfactorily located, and i .erefore we use the 
measure^ 

Sk == 


In the preceding chapter it was found that A" - 79.61 and Med = 79.15 
for the cadet-midshipmen’s gradc.s. In this chapter the value of s was 
ascertained to be 3.67. The skewness, then, is 


Sk . . +0.376, 

3.0/ 


^The presence of the in the expres.sion is explained hs follows: Karl Pearson 
showed empirically that, in moderately skewed distributions of a continuous variable, 
the median tends to fall about I of the distance from the mode toward the mean. 
Consequently he wrote Mo = ~ 3 C? — Med) and, substituting this expression 

for the mode in the measure of skewness, obtained 

.y - [X - - Med)l 3(r - Med) 

gk % 



228 


DISPERSION, SKEWNESS, AND KURTOSIS [Chap. 10 


TABLE lO.S 


Computation of Various Measures for Age at Death of 371 Ameri>^ 

ran Inventors 


Age at d#\ath in years 

/ 

d' 

fd’ 


/(<!')> 

35 and iindor 

10 

3 

-6 

-18 

108 

- G48 

40 and under 

45 

.15 

—5 

-30 

150 

-750 

45 and under 

50 

12 

-4 

-48 

192 

--768 

50 and under 

55 

16 

-3 

- 38 

1 14 

-4:i2 

55 and under 

uO 

26 

-2 

-52 

101 

-208 

60 and under 

o5 

40 


-40 

40 

- 40 

65 and under 

70 

50 

0 

0 

0 

0 

70 and under 

75 

56 ! 

1 

."SC 

56 

56 

75 and under 

80 

6*2 1 

2 

1 12-4 

248 

i 406 

80 and under 

85 

55 1 

1 

1 li’..5 

495 

! 1.48.'> 

85 and under 

90 

25 

i ^ 

100 : 

400 

! I, (UK) 

90 and u ruler 

95 

17 

* 5 : 

1 H.', 

425 

2.12.5 

95 and under 

100 i 

2 

i 6 i 

i 12 ! 

! 72 

! 432 

100 and over* 

1 

1 

1 1 

! y 

i 7 ; 

j 49 

343 

Total.. 

1 

1 

1 371 


-^313 1 

1 2,483"! 

' +3 ii'Jl' 


* This cIaps Ans ua»‘d to iw iiud-vftluH at ihJ '> 

Data Sarsfoi'i pje.ron, '* »h 1 C i* of AtatojcaTi Inventors," 

3\)ct 'iCfftaii /fentu\ VoL 1', S j. t>, i S'l3, arnj by rorreei/ijuiieiiirt*. 


A’ 

2 

Med 


s 


- 185 5. 

32 5 313 

« 70 -1“ ?< 5 « 72.00 years. A* f>7.5 r .,v,, ^ 5 «• 

5t) 3 < 1 

^ . 2,48;r~'7Ai:fy • 


J'l 


V2 


Pt 



.\ 

\ 


•t‘3l3 

”.--p - 0 813006 
2. ivi 

^ ^ f'. 002722. 

3 ( 1 

-F 3 (>1»j 

=. - _ -- « <) 01S7H7. 

A I I 


#1.72 years 



0 . 

0 002722 - MlKbBir.lD* - 5.080050. 

;,3 - H- O 018737 - ;U0 843000;rG.602722) -f- 2(, 0.8-1.3066;* 

-5.7801.33 


This may »M>n,-i(lBroii :.h a niorierate degree of skewiies.'-^, .sun'o the meas- 
ure vane.s within liie limits^' of ±3. It should he added that values as 
large ao ±1 are rather unuriia!. 

For the data of age at death of the Arnerieau inventors, it is shown 


• Harold lIorrOliTijr and Iw^oriard M. Solomons (“Ttie Limits of a Mtoisure of Hkow- 
n#\s.s/' Arnalp of Matkcmatical StatisticH, May 1932, pp. Ml 112; have shown that 
X - Mo<i , 

± 1 . 
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under Table 10.5 that X = 71.72 years, while Med = 72.90 years and 
s = 12.23 years. The Pearsonian measure of skewness is 


Sk - 


3(7 1.72 ~ 72.90) 
12.23” 


-0.29. 


Mea.siires of skewness based on quartiles and percentiles. 

Skewness may also be measured by means of the qiiartile measure of 
skewness. 


(Med - Qy, ^ (y + Q, -2^10^ 

(h -- Qi Qz — Qi 


and by use of an expression employing the lOtli a' J 00th percentiles, 

(/\o - Med) - ( Med y- Pi,) ^ P, o + P 90 - 2Med 

P so — P 10 P 60 ““ PlO 

Since these measurOsS suffer from shortcomings similar to those previously 
mentioned for measures of dihpersion based on quartiles and percentiles, 
they are not altogether satisfactory ineastires of skewness, and no further 
consideration will l)e given (0 them here. 

\I<iasiire of ness based on the third moment, Wr have seen 
that the most satisfactoj-y mea.sur*' of dispersion is the standai J deviation, 
W'hich is bas(!d upon tlie s(*e<md moment about the mean 

iLr^ 

TT': - and 5 - Vtts ^ -- 


A measure of skewness may be obtained by maaing use of the third 
moment about the mean, 


It will be rei\alle(l that the first moment about the mean, 


TTi - - 


is always zero. How^ever, the third moment about the mean is not zero 
unless the distribution is symmetrical about the mean. Cubing a devia- 
tion does not change itvS sign. It does, however, have a disproportionately 
large effect on large deviations. As illustrations, consider the twm sets of 
data given in Tabu s 10.6 and 10.7, the first of which is symmetrical 
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around a mean of 6, while the second is not symmetrical around a mean of 
6. Both sets of data have 




.Y 


= 0, 


and the data of Table 10. G have 


TTz 


N 


- 0 . 


Rut tlie figures in Table 10.7 show 


TTa = 


.V 


= + 0 . 


TABLE 10.6 


TVBLK lO.T 


Computation of f'irst nrui 
Thin! Moment:^ of a 
Sy m niPt rira I Se’rirtf 


Contptttafion of First and 
Third Moments of an 
. Isy rn/netrit'al Series 


A' 



A 



2 

-4 

”01 

:j 

-8 

27 

4 

- *2 

- 8 

4 

“2 

- 8 

C 

0 

0 

6 

0 

0 

8 

H-2 

-}• 8 

7 

-f-I 

4- 1 

10 

^r4 

"to 1 

10 

•ft 

4-01 


'() 



0 

4 8 () 


:l} _ 1) 



LV 0 „ 


TTj — 

a' "" :> 

^ 0. 

TT, 

V ”■ 



1.'^ u 



Xr* _ + :to 


^3 == 

- - 

0, 


A’ 5 " 

4- 0. 


To oninpn\(' rlie third momeni of a fre(|uencv distribution. 




N ' 


takiuK the a«‘tunl d^^viations from the arithinotic mean, cubing them, 
miiltip'lying by the frequencies, .summing, and dividing by A, would be 
lat)orioiLs. As ^hown in Appciulix S, section 10.2, the second moment, s- 
or TTo, can be olitained by a short process. In terms of clas.s intervals 
squared, 

Xf(d')^ {Xftry 

j . 

The value of the third moment (in terms of class intervals raised to the 



Chap. 10] DLSPEHSIOIN, SKEWNESS, AND KURTOSIS 


231 


third power) is given by* 


.V .V +H .V )' 


Or, letting V\ = — ? j;.. -= 

N N 


aiul V-.1 =. 




and 


TTa u-i — I’l, 


TT.t I'a - 


Oiiviunsly, tt;] is a measure of abfsolule ^skewness, 
tive skewness is 


i3i - 


TT 


2 

3 


Tcl 


i'he measure of reta- 


\v here both numerator and denominator are in terms of class interv’als 
raised to the Asixdi power. iSkewncss is also sometimes meawsured by 
aj, where 


Cti ^ Vfil 




Vir] 


0^3 may be given the sign aecompanying tts. We ‘*hall make use of in 
fitting a skewed curve in (Chapter 23. 

d'he values of the second and third moments for the data of cadet- 
midshiprneids grades are shown below' d'able 10.8. From these we obtain 


Tcl _ (2.b42053j2 
tt] “ ’(3.37G276)~^ 


0.18. 


Similarly, the second and third momelits for the age at death of the 
American inventors have been computed in Tabic 10.5. From these we 
obtain 


i-^.isd^ssy 

"" "(5.9’8()y5».7’ 


0.16. 


• See Appendix S, .section 10.3. 

No pi-evious mention has been m.ade of or a^. For any .series of figures, 



232 


DISPERSION, SKEWNESS, AND KURTOSIS [Chap, lo 


Since ttj = 0 when no ykewness is present, it follows that a perfectl}" 
symmetrical series will have /3i == 0. The greater the value of th(5 
more skewness there is in a series. At this point we are not in a position 
to say whether either of the two values just given for fii is sigriificanUy 
greater than ^ero. We shall consider this problem in Chapter 20. 


TABLE 10.8 


Computation of the First Three Moments for Grades of the 
1952 Gradu4iting Class of the Vnited States Merchant Marine 

/icaderny 


NuiTib<*r of 


Grade 

] 

__i 

cadet- j 

/Tud^hipmon i 
/ 1 

j 



! f\dy 

\ 

1 

72 0-73^9 ~ 

‘f 

-3 

i -'ii 

63 

1 

189 

74 0-75.9 

31 

-2 

1 -02 

121 


248 

76 0-77.9 

42 

-1 1 

-42 ' 

42 

• 

42 

78 0-79 9 

54 

0 j 





80 0-81 9 

33 

+ 1 

-f-33 

33 

4 

33 

82 0-8;i.9 

24 

'f2 

4“48 

96 i 

i 4 

192 

84 0-H5 <9 

22 

-f3 

d-66 ; 

198 1 

1 

594 

85.0-87 9 j 

8 

+4 

4-32 ! 

128 ! 

! -t- 

512 

88 0-89.9 

1 4 i 

- -f5 

4-20 j 

100 ! 

1 i 

500 

Total 

' 225 ' ' 


4-74 ‘ 

784’ 

''41 

^352 


>^1 


*^2 




'Y 

~v 

” .V 


f0.32H88!}. 

225 

= ;{.-t8Ul4 
225 

= =, -t-d oosSRSt 

225 


If I *= 0. 

^ ,.2 - ^ 3.484444 ~ (0.328880p - 3 37t)27b 

jr^ 33 i/j — 3 'f 2*'|. 


« 64)08889 - 3(0.328889^(3.184444) + 2(0 32888'**-’, 
* 2.642053. 


Kl RTOSIS 

Chart 10.9 shows a Inpiokuriic distribution. A plniykurlic distribution 
is shown in Chart 10.10. The normal curve is designated 
The degree of kurtosis present in a series may be measured by making use 
of the fourth moment, 

TTi = 


u /Curttc «» humpbacked; thuB, humped or uniiiiodal. Lepto « Miender, narrow 
Plabf =» broad, wide, flat. Mrno » in the middle, intermediate. 



COST IN thousands of DOLLARS 


(]hart K) 9. of New Hoiiwe uii'i ^ AJt to Purchaser, 

'^SoUfi Line aiu! Normal (.urve ( Broken Line) lla> » •<anie N\ X, and 
s. ( Jcveiund, I 92 t. Bns('d on (IhUi of ThI'.Ic 10 9 

or, for a frfMjnoHoy distri]>ntion. 


7r4 




X 


l(y a proi'odiiic .suinlar to that givc-ii in Apfanidix S, section 10.3, it may 
he .shown lliat 


N 


4 


N 




or letting 


Vi 


m<ni 

~N 


TT, «= 1/4 — 4J'iI'3 + — ^v\. 
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Now ir 4 gives an absolute expression for kurtosis. This may be put 
into relative terms by dividing by ttj* The meas^ire is known as ^2 or 
Of 4 » and 

TTi 

P2 = 0?4 = 

TTj 


where both rnjmerator and denominator are in terms of class intervals 
raised to the fourth power. This expression has a value of 3.0 for tlu^ 


PERCE NTAC£ 
f REQUENCies 



r.harl 10.10. Length of Lif<* of a (^roiip of 
Elertric l^mpA ^Solicl Line) Normal (lurvi; 

(Broken Line) Having Same A', A. and s, hn-’od 
on data of Table 10.10. The taily of the normal ^-urve 
are not whown. The left tail would eroas the 1 axis. 


normal curvo. For a platykurtic curve, /Jj < 3.0. For a Icptokurtic 
curve, > 3.0. 

The leptokurtic curve of Chart 10.5) is shown in comparison with a nor- 
mal curve having the same ;V, and s. In Table 10.9 the mometits of 
this distribution have been computed and jSi = 4.46. 

The platykurtic curve in Chart 10.10 is also shown in relation to a 
normal curve having the same N, tnd s. The moments of the platy- 
kurtic series are shown in Table 10.10, and from these jS, is found to be 
2 . 22 , 
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TABLE 10.9 


Compulation of First Four Momenta and of 02 for Coat of New 
5-Room Wood House and Lot to Purchaaery Cleveland^ 

1924 


Cost 1 

(niirl-valaos) \ 

^ ! 

d' 

! 1 

! 

Kd'Y 

f(d')> 

f(d'Y 

$ 1,500 

2 

1 -5 

-jO 

i 50 

-250 

1,250 

2,500 

1 

' -4 

- 4 

! 16 

- 64 

256 

3,500 

2 

~3 

- 6 

! 18 

- 54 

162 

4,500 

6 

-2 

-12 

24 

- 48 

96 

5.500 

16 

-1 

-16 

16 

- 16 

16 

6,500 

27 

0 

0 

0 

0 

0 

7,500 

16 

1 

16 

16 

16 

16 

8,500 

7 

2 

14 ; 

j 28 1 

56 

112 

0,500 

3 

3 

9 

i 27 

; 81 

243 

10,500 

1 

4 

1 1 

i 

1 64 

256 

11,500 

1 


5 

1 25 

1 125 

625 

Total i 

82' 



; 230 

i - 90 i 

i’”3.’()32 


Data from Frank 11 tiarPcUi und William M. Hood “ Cf>n«tru''tion Cof-ts and Real 
Property ValMes,’’ J'^urrm} of iKr A mtriran Statistical Associofton. Vol 32. No. 200, 
j)€<'cmber 1937, p. 647. Data art* those shown in Chart I for 5-rooin wood houses. 


Pi 


U2 




Xfid')^ 230 
a'"' ' S2 


2 878049. 


“v 


- 90 
"82 ’ 

82 


- I 097501. 
30.975601. 


TTi ~ 0. 

TT, = - pj = 2.878049. 

TT, -- p, - 3 p,p3 4- 2p} = — 1.007561. 

Vi - AviVd +- OpJpj -- 3pf = 30.975601. 




TTi 


36 975601 

(i^STSoioT* 


Note. The ks.su mod mean ($6,500) and the mean coincide, resulting tn a value of 0 
(or There art ihcrefore no differences between the v and ir values, since v, - 0, 
PyVj » 0, I'J 6. — 0, etc. 


When a deviation is raised to u 'ourth or a second power, its sign 
becomes positive. I'he fourth power increases c.vtreme deviations dispro- 
tortionately in comparison with raising them to the second power. Con- 
secjuently the narrower the shoulders of a distribution and the longer the 
tails, the greater will be 7r4 in relation toTTj. 

In Chapter 26 we shall consider a method of ascertaining whether a 
value of fit is significantly less than or gi eater than 3.O.. 
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TABLE 10.10 


Computation of f irst f'our Moments and of lit for Length of Life of a Group 

of Electric Lamps 


Length of life 
in hours 
(mid-values) 

Percentage 

frequencies 


p' 

/(-!')' 


A'/')’ 


MY 


50 

1 

.0 

-9 

- 9 

0 

81 

0 


729 

0 

6 

561 

0 

150 

1 

5 

-8 

-12 

0 

96 

0 

- 

768 

0 

6 

111 

0 

• 250 

3 

1 

— 7 

-21 

7 

151 

9 

~1 

,063 

3 

7 

M3 

1 

3.50 

4 

4 

-6 

—26 

4 

158 

1 

— 

950 

4 

5 

702 

4 

450 

5 

0 

- 5 

— 25 

0 

125 

0 

- 

625 

0 

3 

125 

0 

550 

5 

t 

~4 

— 22 

8 

91 

2 

.. 

361 

8 

1 

450 

2 

650 

6 

6 

-3 

-19 

8 

59 

4 


178 

•J 


534 

6 

750 

7 

3 

-2 

-14 

6 

29 

2 

„ 

58 

4 


116 

8 

850 

7 

6 

-1 

- 7 

6 

i 

6 


7 

6 


7 

6 

950 1 

7 

8 

0 


0 

1 

i 



0 



0 

1050 j 

7 

8 

1 

7 

8 

7 

8 1 


1 

'S 1 


*T 

* 

8 

1150 

7 

6 

2 

15 

2 

30 

4 ! 


60 

8 


121 

0 

1250 i 

7 

3 

3 

21 

0 

65 

7 


197 

1 ! 


591 

3 

1350 

6 

6 

4 

26 

4 

1 105 

6 


122 

4 , 

1 , 

{)S9 

61 

1450 : 

5 

-r 

i 

5 

28 

5 1 

1 12 

5 


712 

5 I 

M 

» 

562 

5 

1550 ! 

5 

0 

6 

30 

0 i 

180 

0 

1 

,080 

0 

6, 

ISO 

0 

16,50 ! 

4 

4 i 

/ 

30 

8 ! 

2l5 

6 

1 

509 

• > . 

10 , 

5ii 1 

4 

1750 

3 

1 ; 

8 

21 

S ; 

J9S 

4 

J 

, 587 

2 ' 

!2 

607 

6 

1850 

1 

5 ! 

0 

13 

5 ! 

121 

5 ! 

1 

093 

5 

9, 

Ml 

5 

1950 

1 

0 : 

10 

10 

I 

100 

0 1 

1 

. 01 M) 

0 

10 

000 

0 

Total 

' ' 100 

o' ” . 


4 50 

0 

1,967 

2 1 

t 2 

925 


Sf' , 


Oj 


Data from Robit^y arvi rCd\Mn H K'lrtz, Li/t of PhytiiCiil Prop* KKi, 

lov"' Station, p JyH. Pr*’pr»‘*y (Jrr'H'j- J 

Zfd' --5^• 

2f(d'i^ 1,907 2 

,-,;q - 19.072. 


2/(<i'l> ■■►-2,925.8 

“ ■ 'v'" “ “ ioo.o ' 


-t 29 258. 


lf{d')* 85,650.0 

-.,806.500. 


T\ w 0. 

T, =, c. - y‘ = 19.672 ~ fO.SO)* » 19 422 

ir, - K, - + 2pI = 29.258 - 3(0.50) (19 672) ■f 2f0.,50)’ -•> 0. 

*■4 « K, ~ + Oi'fi'.j -■ 3^; 

- 866.500 - 4(0.50) (29,2.58) -f 6(0..50)*(19.G72) ■- 3(0..50)« 

- 837,.3045 


” 3rJ “ (19.422)* 


2 , 22 . 
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CORRECTION OF THE MOAIENTS FOR GROUPING ERROR 

In computing the mean, 7r2 (or s), ira, and Wi for frequency distributions, 
we made use of the mid-values of the classes as representative values. 
We saw, in the previous chapter, that the rnid-values were incorrect 
assumptions but that the errors present tend to offset each other when we 
compute the arithmetic mean This offsetting is also present when the 
third moment is computed. It will be remembered that the mid-values 
of the classes preceding the modal class tend to be too small, while the 
mid-values of the classes follow uig the modal class tend to be too large. 
The result is that the various x values tend to be slightly larger (in abso- 
lute value) thaii they sho\ibl he, and no offsetting occurs when they are 
squared or raised to the fourth power. Cons .rpiently the value of tto 
( and s) and the value of ir^ are apt to he .sligtitly larger than the values 
computed from tin.' same data ungrouped. Sheppard^s corrections 
attemf)! tvi ull-'C-t this upward bia^ 'rh#‘ ('orrected moments are indi- 
(\Ht(al by jJi iind are 

,Ui - TTj ---* 0, 

--r. TT., - tV, 

M3 ^ 7^3, 

M4 = 7^4 ■ ^2 d- TTir, 

where all computatiotjs are in terms of class intervals. 

If we were to ust‘ the class rnears in.stead of t.ie class mid-values, the 
arithmetic mean ctejld bf^ comj)uted accurately, .^^^w'evo^, if class means 
w'ero used, the values of ir 2 and tt^ w^ould still be smaller than if com- 
puted from the same data ungrouped. We shad give an arithmetic illuvS- 
tration to show' that, when the mean of ca(‘h of several groups of figures 
is substituted for those figures, s for the series is decreased; that is, it has a 
dowmward bias. 

Consuler the tw’o following sets of data. The fitst contains nine differ- 
ent vnlues; the se<-ond ^sho\vs the mean of the first three items repeated 
three times, the mean of the second three items repeated three times, and 
the mean of the last tliree items repeated three tim-^s. The standard 
deviation of the nine different itom? ’ i 2 58, but the standard deviation of 
the threi' groups of meams is 2.45. 

For a development, see C. C. Peters and W. R. Van Voorhis, Statistical Pro- 
cedures arid Their ^fn^hetnatlcal Bases, McGraw-Hill Book Co.. Inc., New York, 1940, 
pp. 72-^73 and S4 -89. 
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X 

A* 

X 

X» 

1 

1 

2 

4 

2 

4 

2 

4 

3 

9 

2 

4 

4 

16 

5 

25 

5 

25 

5 

25 

6 

36 

5 

25 

7 

49 

8 

64 

8 

64 

8 

64 

9 

81 

8 

64 

45 

285 

45 

279 

pSb' 

-(f)'—- 

3 » - . 

-(f)’ - 


If a distribution is so flat that the mid-value of each class closely approxi- 
mates the corresponding (‘lass mean, the value of s (and tti and tt*) based 
on those mid-values may have a downward bias. Such a situation is 
unusual. 

Sheppard^s corrections may be applied when we are dealing with a con- 
tinuous variable which, graphically, approaches the .V-axis asymptoti- 
cally at both ends of the distribution. This latter characteristic is often 
referred to as ‘^high contact with the A^-axis.” If these conditions do 
not obtain, Sheppard’s corrections should not be used, as the corrections 
may over-correct.^^ Neither is tliere justification for applying Sheppard’s 
corrections if the original observations have not been made with reason- 
able accuracy. 

In Table 10.4 the value of s was found to be 3.67. If s is computed 
from the ungrouped data of Table 8.1, the value obtained is 3.60. Let us 
apply Sheppard’s correction to the value of s obtained from tin* fre<[ucncy 
distribution. From expressions previously given, it is apparent that 

= i Vttj - o7o833, 

where is the stfindanl deviation corrected ^or grouping error From 
Table 10.4 we get 


-•= 2.0 - 0.0833 - 3.02. 

Sheppard’s correction has over-corrected. 

**8ee footaote 11 in Chapter 23. Consult also G, II. Davies and W. F. Crowder, 
Metkoda of Siatialical Analysts tn the Social Sriencca, John Wiley and Sons, New York, 
1933, pp. 81-82, and VV. A, Shewhart, Economic Control of Qu<ilUy of Manufactured 
Froductf D, Van Ncstrand Co., New York, 1931, pp. 78-79. 
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capita figures might also be shown. The per capita sales also show an 
upward trend which falls off only a little from the trend of total sales to 
residential and domestic consumers. Per capita sales have grown, among 
other reasons, because of a continuing improvement in the level of living, 
which includes a wider use of electricity in the home as well as the avail- 
ability of electricity to more homes. 


■1LU0NS 
Of OOCLAfS 



1935 (938 1941 1944 1947 1950 1953 


# Chart 11. 1. OcpoHita in Ne>v York State Savings Banks, January 1935- 
Deccniber 1953. Data from \ ariou.s issues and supplements of the Survey of Cur- 
rent Businena. 

Othor factors, too, may be responsible for the growth in a time series. 
The natural sciences have been applied to industry and to agriculture so 
as to increase their output enormously. Not always keeping pace with 
these technological changes, but induced by them, have been changes in 
business organization and methods. The growth ot the corporation has 
permitted the accumulation of suflBcient capital for specialization and 
mass production. Scientific management, personnel management, and 
quality control have also played important parts in increasing the pro- 
ductivity of industry. Automation will, undoubtedly, continue to 
increase industrial productivity. Improved methods of marketing and 
^ better shipping facilities have made commodities available at times and 
places where they w' re not to be had earlier. 
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Not all chronological series show upward trends. Some, like the crude 
death rate, shown in Chart 11.3, exhibit a downward trend. This par- 
ticular declining trend is attributable to better and more widely available 
medical knowledge and, in a large sense, reflects again a higher level of 
living. An economic series may have a downward trend because a 
better or cheaper substitute became available. Tims, synthetic fibers, 


eiLLlOMS OF 

kilowatt moors 


LOOARlTHMJC VERTICAL SCALE 


KILOWATT HOURS 
PER CAPITA 



<^hart 11.2. Sales and Fcr Capita Sales of Klertrie Power to Hes-idrnlial 
or Domestic Consumers in the Ciiiled Stales, 19.^H. from l,\ S, 

Dfpurf of CorniiitTre, Ollire of l onoriii«‘s, liu.smtss Statist 

p ll<2, Suricy of Currnit Bunn^sa, March 1951, p. and \ariou.s issues of 

i^nrrt^rU Population H*;pori}<. 


auch txa rayon an<l nylon, ha\'e partially replaced natural fibers for some 
uses, and synthetic detergents are Ixdng used in place of (*ertain types of 
-oap. More spectacular, though far b(\von<l the memory of most of us, 
was the development of the railroads, which forced into obsolescence most 
of tlie canals in tliis country. Now the railroads find them.selves hard 
pre:s-ed by competition froin trucks, buses, and airplanes. 

Improvenierits in the productive pro<*e.-s are apt to be raj)id at first, 
and demand may be brisk. Hfoviner, as time goes on, it is oftert true 
that further technical and managerial improvements have less and less 
effect on output, vvliile at the same time the market does not continue to 
expand as rapidly as Growth may also he retarded because of the 

iin‘reasing difficulty of obtaining raw material, such as minerals whicli 
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must be obtained from smaller deposits and lower-grade ores. We can- 
not undertake a complete listing of the factors, including financial ones, 
which often combine to slow up the growth of production in an industry. 
Whatever the particular causes may be in a given industry, many 
authorities believe that not only does relative growth tend to decline, but 
eventually further expansion will be physically impossible. Raymond B, 

DCATXS 
PER 1000 



Chart 11.3. Criule U<‘uth Rate in the Registration of the United 

States,, 1900-1953. Dat;i from F. E. Linder and R. D. Grove, Vital Statistics in the 
United Slates^ National Office of Vital Statistics, Washington, 1947, pp. 122-124; 
Statistu'al Ahstract of the United States, 1953, p. 61; and National Office of Vital 
Statistics, Monthly Vital Statistics Report, July 21, 1953 and February 17, 1954. 

Prescott has characterized the tendency we have described as a “law of 
growth, which, he says, applies to all industries. This law embraces 
four stages: (1) period of experimentation, during which the amount of 
growth is snifill; (2) period of growth into the social fabric; (3) period 
during which growth is retarded as a saturation point is approached; (4) 
period of stability. Charts ll.4.\ and 11. 4B indicate thal the domestic 
consumption of rayon filament yarn bei ves in this manner. From the 
first of these charts it is seen that, over the period 1912 -1952, the annual 
amount of growth was initially small but gradually increased; from the 
second chart it is clear that the annual percentage of growth has gradually 
declined. 

’ ^‘Law of Growth in jrecasting Demand,*’ by Raymond B, Prescott. Journal 
of the American Statistical Association, December 1922, Vol. XVllI, pp. 471-479. 
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As previously suggested, sometimes the competition faced by an indus- 
try is so keen, or its source of supply so limited, that it experiences a 
transition from growth to decline. Such an industry is anthracite coal 
mining. The production of anthracite coal for 1880-1953 is shown in 
Chart 11.5. 

We may study the trend of a time series because we are interested in 
the trend itself, or we may wish to eliminate the trend statistically in 


IkMLLlONS 
or T0N4 



Chart 11.5. CrcKlurtion of lViin$i\lvania Aiithrucitc IRftO 1953. Data 

from U. S. Dfp.’irtrnrnt of Conniiorco, Ilmtoncal of the I'nitni Statts, 17 ^ 1 * 

p. 112, Himnea?; Statistics, 1053, p. 108; and Surveif of (Current liusiness, Ffhruary 

195*1, p. s-;u. 


order to throw into relief one or more other movements in the series. 
The statistical problem con sists, first, of deciding the type of trend \vhieh 
will fit the data adf^cpiately and whieh is a logical descrip^tion of the data, 
and, second, of fitting the trend of the type selected. 

'yfcrioclic movements. A periodic movement is one whieh recurs, 
with some degree of regularity, 'within a definite p)eriod. The most fre- 
quently studied periodic movement is that which oc(nirs within a year and 
w^hich is known as seasonal variation^ or merely seasonal, f.'hart 11.6 
shows the monthly farm production of milk from January 1011 through 
December 1952. The seasonal movement in this chart is quite marked 
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('hart 11,6. Milk Prculucfioii on Farms in the United States, January 
1941 -Deremher 1952. Data from Bureau of AK^iFultural Economics, Fann Produc- 
tion, Dispo^iUon, and Inro/ne from ^9tjl—19d^\ Table 1. 


PER CENT 



Chart 11.7. Seasonal Index of C-tnisuniplion oi* iNewsprinl hv United 
States Puhlisliers, 1911 1952. Data of Table I bT. 


in relation to the other movements. iSiotico tliat the seasonal variation 
of milk production is much tlu; same from year to year. This is true, too, 
for the data of consumption of newsprint by United States publishers, the 
typical seasonal for which is shown in Chart 11.7. In Chapter M we 
shall see how to asc.(>rtain the seasonal pat tern when that, pattern is con- 
stant or approxima' iy so. However, marp scries show a seasonal pat- 
tern that is gradually changing with the pa.s.sage of time. The amount 
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of advertising space in magazines is such a series, and we shall determine 
the seasonal pattern for data of United States magazine advertising in 
Chapter 15. 

Climatic conditions, including variations in rainfall, snow and ice, 
sunshine, humidity, heat, and wind, produce variations in demand which 
are often reflected in variations in production. Climatic conditions also 
directly affect production in some industries, for example, agriculture and 
outdoor construction. Although nature is primarily responsible for most 
of the seasonal variations exhibited by time series, there are other factors, 
too. The custom of giving gifts at Christmas causes a marked peak in 
retail (especially department store) sales in December. Other such 
peaks may be expected to appear if advertisers are successful in pro- 
moting widespread gift-giving on Mother's Da}" and Father's Day 
Peaks of retail activity before Easter and Thanksgiving are indi recti}' 
attributable to the seasons, since those holidays owe their origin in port 
to weather conditions. However, the urge to change the style of one's 
clothing or automobile in the spring or fall is also partly the result of 
ostentation. 

The seasonal movement of automobile sales (and the production of 
automobiles and parts as well) is not only due to climatic changes but 
is also the result of certain man-made decisions. In 1935. in an attempt 
to spur a sluggish economy, the automobile vshow, which would normaliv 
have been held in January 1930, was moved ahead to November 1935. 
With new models being brought out several months earlier than pre- 
viously, there was, of course, a sudden shift in the seasonal pattern. New 
models of the various makes of cars are not now introduced at.exactly the 
same time, but nearly all appear within a month or two of each other. 
The introduction of new models, particularly if they embody style or 
mechanical changes, continues to have a pronounced effect on the sea- 
sonal movement of automobile sales. 

We may be interested in seasonal variation either i>ecause we wish 
statistically to eliminate seasonal from a time series or bef'fiuse we are 
interested in the seasonal movement itself, fn Chapter 16 attention will 
be given to deseasonalizing time series data for the purpose of making 
the other movements (particularly cyclical) more readily discernible. 

Interest in the seasonal movement itself may have any one of several 
objectives. First, it may be that we wdsh to ^*iron out" the seasonal so 
that the intra-year fluctuation will be less pronounced. Thus, attempts 
w-ere made to build up the winter demand for ice cream by advertising; 
''Ice cream is one of your best foods. Eat a plate a day." On the pro- 
duction side, hens have been stimulated to lay in the off (winter) season 
by increasing the length of their day with artificial light. 
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Second, a manufacturing establishment may wish to decrease the 
seasonal nature of its activities by producing commodities with comple- 
mentary seasonals. Thus, one concern makes sleds and garden culti- 
vators, On a much larger scale is the objective of an under-water cable 
from Britain to PTance to link tnc eioctnc power systems of these two 
countries. A large proportion of French electrical power comes from 
Hydroelectric plants that suiTer from water shortages in the late summer 
when Britain’s coal-burning (renerators are working below capacity. On 
the other liaud. diinng most of the winter, when Britain’s generators are 
overloaded, France has surplus water to operate its hy(iroeIectric plants. 

Third, one may be intrre.>te<l in a seasonal movement in order to take 
advantage of it, 'rhus. the housewife tries to buy fruit for canning or 
preserving at the jieak of the seaNon when the price is low' and when 
quality may ho higii. 

Although w'o shall not attempt to deal wuth them in this book, there are 
also ^“'T'iodie movements which may be characterized as intra-month, 
intra-w’oek, and intra-day. As an example of an intra-month movement, 
cemsidev a commercial bank which may show' peak activity around the 
first and rifteeath of each rnonih. If the hank is in an area where weekly 
factory payrolls must be prepared, its business may show a cluiracteristic 
intra-week Ttinvemen! . too. which w.il depend upon the day (or days) of 
the week on which the laetories pay their employees. When monthly 
and weekly peaks coincide, the staff of the bank may indeed be busy. 
An interesting intra-w'cek periodic is that observed by Sears Roebuck 
and (k^mpany in regard to the number of cash sales per pound of mail A 
During a normal week the figures fire: Alonda}' 30, Tuesday 37, Wednes- 
day 35, Thursday 32, and Friday 31. The business of a restaurant 
supplies an illustration of an intra-day movement. With three peaks 
each weekday, the manager must, plan ahead and have enough food and 
enough help for these relatively short, busy times. I'he power cable 
from Britain to France, Avhich was just mentioned, will help to dove- 
tail dissimilar intra-day demands for electricity in the tw'o countries. 
.\lthough no one has yet devised an effiident moihod of storing powder, as 
such, it is possible to accumulate water behind a dam. If, during the dry 
season or any other time of the year wdi^ui the dams are not full, France 
uses British power any time during a 24-hour period, some French water 
is being stored behind French danx;:. to help either country meet peak-load 
demands. 

Cyclical movements. Cyclical movements are fluctuations which 
differ from periodic movements in that they are of longer duration than 

® See “Kstimating Daily Order Heceipta from Weight of Mail/’ by C. W. Smalley, 
The American Statisticum, February 1954, pp, 14-15. 
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a year and also in that they do not ordinarily exhibit regular periodicity. 
Business cycles are not random movements because the position of busi- 
ness at a given point in a cycle is affected by the activity in previous 
months and, in turn, affects business in the immediate future. In othei 
words, the transition from a low point to a high point, or vice versa, is a 
progressive development. Cy^’lcs appear to operate somewhat on the 
principle of a pendulum. Just as a pendulum is pulled by gravity toward 
a vertical position, but tends constantly to move past its position of 
equilibrium, so it is said that business is drawn toward an equilibrium by 
the forces of demand and supply, and so also do the errors in one direction 
tend to progress into errors in the opposite direction. Such an explana- 
tion of business cy(dos is known as the ‘^self-generative theory,'^ usually 
associated with the name of Wesley C. Alitchell. But just as the meclia- 
nLsm impclliiig a pendulum must be wound up occ^asionally, so it is 
possible that economic activity would attain eciuilibrium were it not for 
other propulsions of varying degrees of intensity. It is possible to speak 
of cycles in general biLsiness or of cycles in partiendar industries, such as 
residential construction, cattle raising, or textile production. Rarely, 
cycles in a specific industry or business may appear to be inherently 
periodic, but they are, in any event, modified by the ))osition of the cycle 
in general business. Furthermore, since all industries are so inter- 
dependent, a re\ ival or rec(\s.sion in a kev'^ industry or industry group soon 
transmits its effect to other branches of activity. 

it appears tluit (*yclical movements of general aftivily may be grner- 
ated by a concurrence of the same cyclical phase in the activity of several 
important industries; or they might lie generated by interferences from 
outside the business world. These interferences might be occasional 
events of <;onsidor:ible magnitude, such as a war, a discovery, unusual 
weather, or some political (ivent; or they might b(‘ the sirnultam^ous 
o<*curren^‘e of .seAan'al minor events, each reinforcing the effect of tlie other. 

When cycles apjiear to have a rough regularity, this n'gularity may 
possibly 1)0 explained by th«- periodicity of certain of the extraneous 
events which, some authorities lielievo, are in part responsible. Cycles 
in weather have Ijeen suggested. It is mortj likely, however, that what 
regularity »*an be observed is due to tlie fairly constant length ol^lirne it 
takes tlie bu^im‘ss world to respond to .stimuli. For instance, the time 
it takes for ere^*t.ing a laiildiiig or for foreelosing a mortgage, or even to 
decide to go into bankru])tc3^, is not utterly irregular. Perhaps greater 
regularity wxiuM be observable wore it not for the irregularity of acci- 
dental occurrcTK^es. 

There are some who rejeid. the coiuiept of self-generating cycles, 
believing that cycles are brought about largely by external influences. 
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Even these observers, however, are interested in noting whether produc- 
tion and consumption are increasing or decreasing, and especially in 
discovering practical measures for stabilization. Whether self-generated 
or caused by external factors, it is clear, from (diart 11.10, that there 
have been cyctlical fluctuations in United States magazine advertising, and 
that the cycles have not been of the xsame length. Chart 11.10 also 
illustrates a difliculty frequent)}'' encountered in the study of time scries. 
It has to do with the de(;ision concerning what is a cycle. Does the curve 
of Cdiart 11.10 show about two large cycles or several smaller ones? A 
decision may be iiifluenccMl In'' the trend used for the series. As will be 
seen later, the trend employed was a straight line fitted to the years 
1915-1949 and extended through 1953. Had we 'concerned ourselves 
with a shorter period of time, for example, 1933 1953, and made use of a 
trend for only tliosc years, two (wcles would have appeared for the 
twent 3 ''-oiie-year peri od. 

Irre«orular variations. The irregular variations in a time seii>s are 
sometimes divided into two calegorie.s. tyisotlic and acrideniaL When 
episodic movements ociair in a time series, they may be readily identifiable 
in the chart of the .series if they are due to spe<nfic events, sin.'h as oarth- 
(piakes, conflagrations, strikes, early or late molting of i(‘e on the Great 
Lakes, scv'erc storms, or otlier occiirr-nces. 

'rhe unadjusted data of magazine advertising, shown in Chart 11,8, 
w^uuld reveal a number of episodic movements to one wdio is familiar Avith 
that field of a(‘tivily. For example, shortly after tlie end of World War 
II, there was an increase in the amount of magazine advertising space 
used, wdiich resull(‘d in a less-thar'-^'^xasonal d' ’hie in Do<*omber 19^5. 
This appears as a .diarj) peak in tbe curve, of tlc' lata atljustc^d for .‘sea- 
sonal nio\einents, which is also shown in Chart 1 1 8. An e])isodic move- 
ment which w^a.s inqini'taiit enough to be rethM-uHl in animal data appt'ars 
in (fliart 1 1.3. The veiy high death rate in 191S w.is iln‘ O'siilt of an 
epi<lemic of inflmaiza which caused many deaths among cailian and 
military personnel. 

As mentioned bel■or(^ an ojiisodt' may lx* iiiqiort int enough to gem‘rat(\ 
or assist in geneiating, a cyclical fluctu.-ition. (Jcc'asionally it may lie 
diffieultTo distinguish Ix'tween an epi.'^odic rno'aun(*nt and a cycle. 

Accidental movements are' minor fluctuations rot attributable to 
specific episodes and too .small to * ^rit individual consideration. These 
accidental fluctuations may sometimes be of a random nature. The 
irregular variations (accidental and epi.sodic comhined) for United States 
magazine advertising are sliown in Cfliarts lfl.7 und Ifl.S. 

Other rnovomoiils. Tlio four movouionts which have been men- 
tioned are the most prominent one.'s ordinarily found in time serie.s. 




Chart llA- Magaaane Advertising in the I’nited States (Broken Line) and Oescasonali/ed Data (Solid Line), 192M953. Data from Table 

16.3 and from worksheets (not shown) for the years omitted from that table. 
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Sometimes investigators find *Mon g c ycles/' which are of much longer 
duration than the usual business cycle md which may last roughly 50 
years. Both types of cycles may be present simultaneously and super- 
imposed on each other. Occasionally, students of time series claim the 
existence of more than two cyclical components in a time series. Inter- 
mediate between the long cycle and the business cycle, a movement 
called '^secondary trend'' is sometimes found. In this text we shall give 
no further attention to long cycles or secondary trends® but shall concen- 
trate our attention on the four movements first mentioned. 

A GRAPHIC PREVIEW 

The nature of the four leading movements in a time series may be 
understood more clearly if we look at some of the charts of data of United 
States magazine advertising, which will be considered in more detail 



Chart 11.9, Seasonal Movements of Magazine Advertising in the United 
States, 1921--1953. For sources of data, see note to Table Jt).3. 


later. The lighter broken line of (.‘hart 11.8 shows the original data in 
terms of thousands of agate lines. This curve includes all of the move- 
ments; trend, sea>soiial, cyclical, and irregular Chart 11.9 shows the 
seasonal variation present in the series, and the 'olid line of C^hart 11.8 
shows the appearance of the data after thev have been adjusted for 
seasonal variation. The «;yclical movements are indicated in Chart 
11,10. No chart of the irregular movements is included here, but, as 
noted before, they may be st^en in Charts 16.7 and 16.8. 

PRELIMINARY TREATMENT OF DATA 

Some variations in time series are due to the terms in which the data 
are expressed, aiid at times it may be advisable to make certain adjust- 
ments before undertaking to analvze a time series. 

Calendar variation. Usually, though not always, there are 365 days 
in a year. Although there are 12 months in each year, the months vary 

» For a discussion of these movements, see R A. Gordon, Business Fluctuations, 
Harper and Brothers. New \ ork. 1952, pp. 201-209. 





4^ha rt 11,10. Cvclical Movements of Magazine Advertising in the L'nited States, 1921 1953. Data from Table 16.5 and from worksheets 

(not shown) for the years omitted from that table. 
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in length from 28 to 31 days. To make matters more complicated, the 
different months do not start on the same day of the week, nor does the 
same month in successive years so start. Another difficulty has to do 
with the number of working da 3 ^s in a month. Not only do the number 
of Saturday.s and Sundays vary between months, but February, with 28 
or 29 days, has Washington's Birthday and Lincoln's Birthday, while 
March, with 31 days, may include no holidays. February may include 
as few as 18 working days, while March may have as many as 23. The 
fluctuation of Easter between March and April also introduces an element 
of confusion. 

Although it seems impossible to divide the yeai into quarters contain- 
ing the same number of whole weeks, nevertheless some l’>usiriess firms 
have tried to minimize the difficulty. A few tirm.s keep records by 4-week 
periods. There are 13 such periods in a year, but quarterly data cannot 
be kept by this syst(jm. A few other firms keei) records by (|uarters, each 
quarter being composed of three months the iirst two months of four 
week'" <^fich arul the third of five weeks. Of course, neither of these plans 
is satisfactory so long as the first of a given calendar month may occur in 
eilher of two artificial months. And under any plan, the unsystematic 
oc.curren(;e of huliflays re.sults in a different number of working days in 
successive artificial months. Movements have been launched to change 
the calendar to remedy these deiects. One plan suggests identical 
quarters; eaoli quarter would contain, not identical months, but three 
monthly patterns of thirty or thirty-one days ea(*h, these three patterns 
being repeated so as to occur four times a year. An e.xtra day, however, 
known as Year Day, would occur at the middle of the year. 

The statistician is somctime.s confronted with the problem of adjusting 
a time series for either the number of calendar days in a month or for the 
number of working days in a month. If monthly data of the residential 
consumption of water are to be adjusted for calendar variation, the 
appropriate adjustment would doubtless be on the basis of calendar days 
rather than working daj^s. This adjustment is accomplished by dividing 
each monthly figure by the number of days in the month, giving consump- 
tion per day. If it is desired to retain the figures in their original magni- 
tude, the consumption per day may be multiplied by the average number 
of days per month, which is 365 ^ 12 = 30.4167 for a 365-day year. 

For monthly production data, the adjustment for calendar variation 
would involve consideration of the number of working days rather than 
calendar days in each month. To adjust for the number of working 
days, the following procedure may be followed: 

(1) Ascertain the holidays observed by the industry. These will 
differ in different industries and in different localities. 
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(2) Count the number of holidays observed in each month of each year. 

(3) Count the number of Sundays in each month of each year, if 
Sunday is not a working day. 

(4) Count the number of Saturdays in each month of each year, if 
Saturday is not a working day. If Saturday is a half-holiday, the 
count should be halved. 

(5) For each month, add the counts obtained in (2), (3), and (4). 'Die 
resulting figure is the total number of non-working days, including 
allowance for an extra holiday if a regular holiday occurred on 
Saturday or Sunday. If no such extra holidays are observed, 
appropriate subtractions should be made when a holiday occurs on 
a Saturday or a Sunday. 

(6) Obtain the number of \\orking days for each month by subtracting 
the figure obtained in {3) from the number of calendar days. 

(7) Divide the original data for each month by the number of working 
days f^'^ the month to obtain production per working day. The 
data may be restored to their original magnitude by multiplying 
the proauction per working day by the a\eragr‘ number of working 
days per month for the year under consideration. This average 
may vary slightly from year to year. 

It would be entirely inappropriate to adjust some time series for 
calendar variation. Clearly it would he spurious to do so for e\o<uitive, 
administrative, and supervisory salary expenses of most ('urporations, 
since such salaries are usually paid on a monthly basis irrespective of the 
number of days or working days in a month. For data requiring adjust- 
ment, it is frequently a difficult statistical problem to decide whether to 
adjust for working days or merely for calendar days. For some com- 
modities it can logically be maintained that holidays within a month, far 
from decreasing consumer purchases during that month, may actually 
increase them If the holiday occurs on the last day of the month and 
the stores are closed, however, it might decrease sales. In organizations 
which receive orders through the mail from a considerable distance, saltvs 
may be decreased by h(;lidays occurring during the last few days of the 
preceding month. Just what is the logical adju.siment to make is often 
very diffiiuiU to determine and reijuires familiarity Avith the business or 
industry in (piestion. In case of dout>t it is always possible to determine 
experimentally what method giv^es the smoothest results after the adjust- 
ment is made. Such a test provides no conclusive evidence but is only 
presumptive. Sometimes a separate adjustment .should be made for 
Easter, as explained in Chapter lo. 

Population changes. It has already been noted that one element 
in an upward trend may be the increase in population. Data may be 
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adjusted for population change by dividing the original figures by the 
population figures, thus expressing the data on a per capita basis. This 
is what was done in Chart 11.2. Alternatively, the population figures 
may be put in relative terms with the population for a selected census 
year, say 1920, set equal to 1.00, or 100 per cent. If the original data 
are then divided by the population relatives, the resulting figures will be 
in terms of a fixed (1920) population. 

Price changes. Interest often centers in physical volume changes 
rather than changes which have occurred in terms of dollars. Series such 
as sales, earnings, cost of materials, and others which are originally 
expressed in dollars must be defiaied in order to be expressed in terms 
which are independent of price changes. Deflation is acc^omplished by 
dividing the dollar series ])y an appropriate price index series. Table 
1 1 .1 shows the average hourly wagCvS paid to emplovees of Class I railways 
in each year from 1917 to 1952. To the right of the column of hourly 
wages is given the Consumers’ Price Index for the same years. Now, if 
hourly - ;v's m dollars for each year are divided by the corresponding 
price ind(*x (expressed as a decimal), the result is a series of hcairly wage 


TABLK 11.1 


At'erafie Hourly of Employt^es of (Husfi I liaihrays 

find Consunierf*'' Price Index 1947-1952 
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Ofttii from lOaj^toin lUnho.nl PieMdt'ntu (^j .ferenoe, A }\'arb(tok of Railroad 
I nfttrmalxon, 19.j 3 edition, p. 74, and Monthly J.abor Rcvietr, St-pleinoer 1953, 
p. 1034. 


figures adjusted for changes in prit‘es. These are shown in Column (4) 
and are referred to as real wages or, sperdfically, wages in terms of 1947 
1949 dollars. Chart 11.11 shows curves of hourly dollar wages and 
hourly real wages. Even though prices rose during 1947-1952, hourly 
real wages showed a steady increa^^c.. Note that the figures shown in 
Table 11. 1 and Chart 11.11 have to do with average hourly wages. To 
ascertain if the railroad employees’ purchasing power increased or 
decreased over the period, we would have to consider the hours worked 
during each year. It happens that the hours worked decreased slightly 
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from 1947 to 1952, but the real annual wages for employees of Class I 
railwaj'^s nevertheless showed a steady rise. 

In Table 11.1 the Consumers^ Price Index was used as a deflator. An 
index of wholesale (commodity prices, for example, would have been 
entirely unsuitable. Unless a deflator is used that pertains to the data 
being deflated, a satisfactory adjustment for price changes will not be 
obtained. 

DOLLARS 
PER HOUR 

1.90 
1.80 
1.70 
1.60 
1.50 
1.40 
1.30 
L20 
0 

1947 1948 1949 1950 1951 1952 

Chart 11.11. Averaj^e Hourly arul Average Keal Hourly Wages 

of Employees of Clasn I Kailways, 1917-1952. oi I 11.1. Koal 

wagc.s are in terina of th<i ('onsiuners Price Index, ^^hlch lias PH7- 194‘) - 100. 

Securing comparability. Stfiti-sdciaris for trade associations experi- 
ence considerable difficulty in obtaining prompt reports from all members. 
For instance, 93 firms might report on time one month and 96 the next 
the latter not necessarily, however, including all the 93 firms. To be 
strictly accurate, a new time series should be constructed each month 
for the entire 'period including all of, and only, those firms which reported 
promptly for the month in question. Thus, a complete time series one 
month would be computed for the 93 firms, and the next month for 90, 
This is a very laborious procedure. An easier procedure is to make a 
preliminary estimate by computing the percentage of the preceding 
period for only those firms whi(!h reported promptly for the current 
month, and to multiply the figure for the preceding month (which 
now includes all firms) by this percentage. A revised figure can be com- 
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puted when all the reports have been obtained. If an industry is expand- 
ing and new firms are appearing, it is, of course, desirable to include them. 
Increased employment and production may result from increased activity 
of existing firms or the appearance of new ones. Similarly, firms may 
cease to exist and must be dropped from a reporting list. 

Another source of incomparability may be the fact that the unit of 
reporting has changed. If it is merely a question of changing from a 
pound basis to a ton basis, this is a simple matter. Where the product 
has changed in kind, however, it is difficult to find a satisfactory solution. 
How, for instance, can we compare the physical production of radios 
between 1935 and 1953? Not only was there a difference in the propor- 
tion of radios of different grades sold in the two years, but radios that were 
the same wuth respect to price, weight, number of tubes, or any other 
readily measurable characteristic were still v stiy different in their 
capacity to render utility to the consumer. 



Symbols Used in Chapter 12 

a: a constant in the equation Fc = a + bX ; the value of Yc when X = 0 
the Y intercept. 

6 : a constant in the equation = a + bX] the slojjc. 

N: the number of items in a series. 

S: upper-case Greek sigma, meaning ^'take the sum of/’ 

X: a value of the A'' series. 

A'l, A' 2 , As, • • • , Ay: specific values of the A' series, 

A: the arithmetic mean of the A"' values. 

F ; an observed value of the F series. 

Yc\ a computed value of the F series. 

Fi, Fi, F 3 , • • • , Y\: specific values of the Y series, 

F: the arithmetic mean of the F values. 
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CHAPTER 12 


Analysis of Time Series: 

SECULAR TREND I THE STRAIGHT LINE 


There are two important reasons for attempting to describe the trend 
of a s»'r:. means of a curve. Fir.st, it may be desired to measure the 
deviations from the trend. I'hese deviations consist of cyclical, seasonal, 
and irregular movements. Frequently, obtaining these deviations is but 
one step in attein{)ting to isolate cycles in order to study them. Second, 
it may be desired to study tlie trend itself, in order to note the effect of 
factors hearing on the trend, to compare one trend with another, to 
discover what effect trend movements have on cyclical fluctuations, or to 
attempt to forecast the future beha\ior of the trend. 

The purpose for which measurements are made partly determines the 
methods adopted. If the ohjei't is solely to isolate cycles, it seems 
reasonable to suppose that the trend iine chosen sLuuld pass through the 
cycles in vsuch a way as approximately to allow a b.vlancing between the 
positive and negative phases of each cycle. Whether a curve is deemed 
to have accomplished this object depends, of course, upon our conception 
of what constitutes a cycle in each case. If, on the other hand, the object 
is to make comparisons, generalizations, or forecasts, the curve should be 
not only logical, but also of such a nature that it can readily be expressed 
by a mathematical formula By means of such a formula a person can, 
for instance, say that at a given time a series shows a certain latio, or a 
certain amount, of growth per annum, and that, if this tendency con- 
tinues, the trend will reach a certain value at some specified time in the 
future. Fitting a trend by a mathematical formula does not, how^ever, 
remove the subjective element from trend fitting. The statistician can 
vary the behavior of the curve by selection of the type of formula he 
employs, or the years to which he fits the curve. It remains true, there- 
fore, that the statistician decides in advance, upon as objective and logical 
a basis as possible^ what he thinks the trend ought to look like, and then 
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selecte the mathematical method that will closely approximate this 
result. 

TREND FITTED BY INSPECTION 

The simplest method of describing a trend graphically is by inspection. 
If the trend is a straight line, it may be drawn with the aid of a trans- 
parent ruler or a tightly stretched piece of string. If the trend is non- 
linear, it may be drawn freehand or use may be made of a spline, an 
adjustable curve niler, or a French curve.^ 



Chart 12.1. Magazine .Vdverliaing in the Cniird Staten, 1915 1953, and 
Straight-Line Trend Fitted by Irinpection to the Yearn 1915 1919. Advertifi- 
ing-Iincage data from Table 12,2. See note.s folio\^ing the title of Chart, 12.3. 

Chart 12.1 shows a fit of a straight-line trend, b}^ inspection, to maga- 
zine advertising in the United States for 1915-1949. Whenevera curve is 
fitted to a set of data, a criterion of fit is involved. The trend of CTiari 
12.1 was drawn through the curve in such a manner that cyclical pt)rtions 
above and below the trend line were judged, by inspection, to be about 
equal. The trend line also passes through the approximate average 
(determined by inspection) of the advertising lineage data at the middle 
year, 1932. This highly .subjective method is open to the objection that 
may be made to all subjective methods: one determines what answer he 
wants and then proceed.^ to determine it. However, as has already been 
mentioned, very nearly the same result may be obtained by careful selec- 
tion from among the numerous available mathematical procedures. 


• TTiese three devices are available from firms selling artists’ and draftsmen’s supplies. 
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LEAST-SQUARES FIT OF STRAIGHT LINE 

A mathematical equation not only allows us to draw the trend of a 
time series but provides, also, in the trend equation, a concise definition 
of that trend. If the trend itself is to be studied, or is to be extended 
beyond the observed data, it is particularly desirable that the trend be 
described by an objectively determined ec|uatioii. 

The straight line. The simplest typ(» of curve is the straight line, 
which is described by an equation of the type Vc = a + bX, in which A" 
is the independent variable and Y, the trend value of the dependent 
variable/^ Siin^e their values must be determined for each of the series 
being analyzed, a and b are referred to as nnknownf;. They are also called 
cotififanfa, since, once their values are determined, they do not change. 

To take the simplest case, suppose that a — C and 6—1. The equa- 
tion then becomes: Yc ™ X\ and this means that with t‘ach increase of 
one unit of the independent variable, the dependent variable also increases 
one unit. This equation is plotted in the upper left section of Chart 12.2. 
Incidentally, it should he observed that ail four quadrants are shown in 
this chart. Before attempting to plot a (uirve, it is well to draw a 
t ablfi of X and T,. values, as shoM n on tin? chart, in wliich are recorded the 
computed values of Y that c(wresp( ud to s(‘Iected values of A". As a 
matter of fact, only two points an' needed 1o plot this or an^^ straight line, 
and most accurate result, s are obtained by using tsvo X values a consider- 
able distance from each other. 

Other straight-line equations and their curves are shown in the other 
sections of Chart 12.2, an inspection of which yields the following informa- 
tion: a is the value of Y when A" is 0 (the value at the X origin), or, as 
it is frequently ternTed. the Y ifitercept: Avhile b indicates the steepness, 
or slope^ of the line. When 6 is positive, the slope is upward; when b 
is negative, the slope is downward. 

Although the straight-line trend of Chart 12.1 was obtained by inspec- 
tion and nol; mathematically fitted to the data, we can nevertheless 
determine its approximate equation. If the origin be taken at 191o, it 
wull be seen that the curve has a Y^ value of 20, so a — 20. To determine 
bj we merely need to ascerUiiu the value of rhe trend for 1949, whi(*,h is 
43, take the difTerem'e between that value and the trend value for 1915, 
and divide by the number of elaps^ ■ years, 34. This gives 


43 - 20 

34" 


= 0 . 68 , 


* The symbol Y will be used to designate an observed value of the dependent 
varijible. while W indicates a value that has been eornputed, usually from a niathc- 
rnatieal e<puUion. 















Chap. 12] THE STRAIGHT LINE 265 

which is the value of 6, the amount of increase in the trend each year. 
The equation, then, is 

Yc - 20 + 0.68X, 

Origin, 1015, X units, one year. 

Trend equations for tim(» sf^ries must always he accompanied by a state- 
ment concerning the origin and .V units. We must specify the X 
units, since, as we shall see laf(ir. riicy may he one year, one-half year, or 
one month. The origin must be indicated because series of data by years, 
months, or otiier chron()logi(‘;d units do not have a zero useful for fitting 
purposes. (\)nse(juently, (he statistician can select the X-origin where 
lie pleases, and we shall see later that it will be advantageous to choose 
that origin at the middle of the chronological series whenever possible. 

If we rewrite the eejuation for the trend of Chart 1 2. 1, with 1932 as the 
origui, \\\: have 


r, - 31 b f 0.b8A". 

Origin, 1932. .V units, one year. 

Note that the value of }/ is the same as before. The new a value may be 
obtained either by reading the trend value for 1932 or by adding 17 times 
the h value to the former a value. The value of b is multiplied by 17 
because 1932 is 17 years removed from 1915. 

iVIethod of leant sqiiurt's. Tb^^ method of :east squares provides a 
convenient device for obtaining an objective tit * / a straight-line trend 
line to a series of data. It can also be applied to a number of more com- 
plex trend types, some of which will be discuosed in Chapter 13, The 
method of least s(|u:ires aiaannplishes two objectives: 

1, The sum of (he vertical deviations of the observed values from the fitted 
straight line equals zero. If a vertical line were to be drawn, in Chart 
12.3, from each Y value for 1915 1949 to the trend line, the vertical lines 
extending upward from the trend line would exactly balance those extend- 
ing downward. This trend is not the only straight line from which the 
algebraic sum of the deviations ecjuals zero; as a matter of fact, any 
straight line (other than vertical) hich passes through Z, F fulfills this 
requirement. 

2. The sum of the squares of all these deviations is less than the sum of the 
squared vertical deviations from, any other straight line. It is because of 
this second characteristic that the method of fitting is called the method 
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of least squares.* When a curve is fitted to meet this second requirement, 
the first requirement is automatically satisfied.'^ 


rftLviOMS or 
MMTC Lwes 



Chart 12.3. Magazine Vdvertising in ihe 1 iiited States, 1913 1953 an<i 
Trend as Shown by a Straight Line Fitted by the Method of Least Squares to 
the Years 1913-1919. n.it t of Table i‘2.2. Note that Jiu-luNifm of the first four 
years had little efTect on the trend because of the length of the period covered. See 
page 279. Note aLyo that two Irfaids, one for the fir.si part the series and one for 
the latter part fsee page 278;, might hav^^ been used. 

In a sense, a trend line fitted by the method of least s(|uares is analogous 
to the arithmetiamean, since the arithmetii^ mean is a single valne^ rathiu' 
than a series of values, stimmarizing a set of data and possessing the two 
characteristic.s just mentioned. 


^ It can be demonstrated that tlio greatest probabilit y of oiitaining d( viali^uis which 
are distributed normally fsec ('’haptor 2:>) around s<unc computed vahu' or scries of 
values is obtained when the sum of the squared deviations is al a miuununi fsce 
Appendix 8, Section 12,1 j. If it believed that deviatioiH from the a])propriate norm 
are chance errors, it follows that th<‘ in'-thod of le;ist sqiian‘s is the appropriate inr-lhod 
of fitting. The inethiid is also convenient algebraically, as the student can observe in 
connection with correlation analy.sis ami analysis of variance. Time seri(‘S fluctu- 
ations around a trend line are not, however, independent acci<lcrital occurrences, 
and it is to be doubted that there is any bpeeial reason for using tlie mctliocl of Iciist 
squares in trend fitting, otf.er than it'^ cori\ enience. Oitain of the trends exydained 
m thi.s volume are, in fact, fitted b\ other methods. Some statisticians even argue 
tliat the leasl-squanis criterion is mo appropriate for tirm^ series trends, since time 
serie.s are sometimes eharacteriz«'d by I'xtreme deviations not in accordances with 
the normal law. The method of least Bepiare.s. of course, is particularly influenced by 
extreme deviations because of the sipiaring process. 

* The mean of the W values is the same aa the mean of the Y values. This is demon- 
atrated in Appendix S, Section lt>. L before reading that (‘xphination, however, the 
reader should peruse the next 'Section of thi« chapter. 
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The normal equations. It has already been noted that the equation 
for a straight line involves the two constants a and b. For a fitted 
straight line, the values of a and h must he determined from the observed 
data; consequently, two normal equations must be obtained and solved 
simultaneously. These normal equations are: 

T. SF + hZX, 

IL SAF - aiW f- blX\ 

Without attempting ii dori\ntion'' of these normal equations at this 
point, we shall make use of a set of Mmph' illusti'ative data to see how these 

ruu>: 12.! 


Determination of y^orniol E*i\ntlion** and of SnniH for Fit of ^tmif^ht Line, 
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two equations ar^ arriv'i'd at. I’he data are sh(>\^ n in t 'oiumns 1. and 2 of 
Table 12,1, ’and in (’hart 12 4, where it may be seen that IIku'c are seven 
{lairs of Y values. We shall therefore first \vnte down se\cn observa- 
tion equations, from v/lre’h we shall obtain the two normal equations. 
(Jolumn 3 of Tah](^ 12 1 ^hows the seven observation ecjuaTions. Since 
the observt;d data do not fall on a straight line, the seven observation 
equations are not' ail consistent wutli each otiuu*. It is the purpose of the 
two normal eipiataons to enable us to arrive at a sort T average solution 
of these observation ('((nations 

The first normal capiat ion is ol)tained by multiplying each observation 
equation by the coefficient of a in that equation and adding. The coeffi- 
cients of £ 1 , which arc I, arc shown in Column 4 of Table 12.1. Column 

‘ For a derivation of the two normal equations, see Appendix S, section 12.2. 
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5 shows the observation equations again (unchanged, since the coefficients 
of a were all I) and their sum, which is the first normal equation. 

To get the second normal equation, each observation equation is multi- 
plied by the coefficient of b in that equation and the sum obtained. The 
coefficients of b are shown in Column 6 of Table 12.1 and the results of the 
multiplications are given in Column 7. The total of Column 7 is the 
second normal equation. 

Y VALUES 



t VALUES 


Chari 12.4. A Straight Liac, t itted hy the Method of Leant Squares^ 
to a Set of llliintrative Valiien. Data of Tai>lc 12.1. 


The two normal eciuations may now b(i set down: 

1. 20 - 7a + 216, 

II. 74 = 21a -h 916. 

To solve these simultaneously, we multiply normal equation I by 3 and 
subtract it from normal equation II, thus eliminating a and obtaining one 
equation with one unknown, 6: 
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IL 74 = 21a + 916, 

(I X 3). 60 ~ ^la - f 636, 

14 -' 286 , 

6 = 0.5. 

To got; the valuo of a, wo siiKsfituto the value of 6 in either normal equa- 
tion I or II. TAsing normal efiuation I: 

20 - 7a + 2U0.5), 

- 7a 4- 10.5. 

7a — 9,5, 
a - 1.357. 

As a (‘heck, the values of a an(1 6 may be siibstitute(i in normal equation 
II, as folI(3\vs: 

71 - 21(1.357) 4- 91(0,5), 

- 28.5 + 45.5, 

' - 74.0. 


The <?cjuation of the fitted straight line (which is shown on Chart 12.4) 
may now 1)0 written: 

Vc - 1.3() + 0.5 A. 

N otice tliat it was not necessary, in this (*ase, to state the origin or the A' 
units, since tin' A' values were not dates. 

The foregoing illustration was a sj)ecific insth:j 'e involving but seven 
pairs of value.s. To be more general, let us write the observation equa- 
tions for A" pairs of values as follows: 

Vi = a + hXi, 

]\ a -f 6 AN, 

}' 3 — ad- 6A 3, 


r^ = a+ bX^. 

If, now, we multiply each of these observation equations by the coefficient 
of a (which is 1), they are unchanged and their sum is 


I. Sr-iVa + 6SX. 
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This is the first normal equation. To get the second normal equation, we 
multiply each observation ecjuation by the coefficient of b in that equation 
and add, obtaining: 


-V.F, 

= aA'i 

■f bXl 

A%r, 

= aX. 

+ bXi 

AM', 

“ a A" 3 

+ bXl 


Xs Ys - +bXl 

II . ^X Y = aSX T h)^X\ 

Note that we write aSA" and b'ZX'^ rather than 2a A' and because 

a and b are constants. 

We are now in a position to use the two normal equations to determine 
a straight-line trend. We shall not find it necessary to s(‘t up any more 
observation equations; only the normal equations will be needed. For 
the illustrative data of Table 12.1 , only the df ColnmuH /, 2, 8, and 9 
and the value of N are iisedy giving, for the two normal equations: 

I. 20 - 7a -f- 21/;, 

If. 74 = 21a f 916, 

which is the same as the two equations shown at the bottom of ( 'ohimns 5 
and 7 of the table. 

We shall make^use of two. or more, normal ecpialions not only to fit 
trend lines by the jnethod yA least squares in this <*hapler ami in C^iapter 
13, b\it we shall al.so employ them in Chapters 19, 20, and 21 when dealing 
with linear, non-linear, and multiple correlation and in (''hapter 22, as 
well, where we correlate time series. 

Odd number of years. The daC of Table 12.2 and the solid curvu^ of 
Chart 12.3 sViow the amount of advenising in tn gazines in the United 
States in millions of agate line.s tor 1915 1953. \V(‘. shall fit a straight 

line to ihc data for 1915 1949 and ('Mend that trend line through 1953. 
'fhe two norrnai equations 

T. 2r ---- A- a + 62X, 

II. 2XK ^ a2X +62X^ 

will be used to determine the values of a and 6 for the straight-line trend. 
However, it is possible to simplify the*n in such a manner that simul- 
taneous solution of the two eipiations will not be necessary. Owing to 
the fact that years constitute the X variable, we must select an origin for 
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Computation of Values for Fit of Straiftht Line to Data of Magazine 
Advertising in the United States I915~-19i9 


(Millions of hnoa) 


Year 


1 

A'K 

Trend values 
i'c 

1915 

-17 i 

10 9 

-287.3 

21 2 

19iG 

- 10 i 

20 0 

-820 0 

21 8 

1<)I7 

15 1 

21 8 ! 

-819 5 

22 4 

1918 

- 11 i 

18 0 1 

- 2»;o i 

22 9 

10 HI 

-•13 I 

25 7 j 

-884 1 

28 5 

1020 

-12 i 

V8 0 

-103 2 

24 1 

1921 

- 1) 

22 8 ; 

-245 8 

24 7 

1922 

10 ■ 

21 1 ; 

-241 0 

25 .3 

192:> 

- 9 

30 2 i 

- 271 8 

25 9 

1921 

s ■ 

81 1 ' 

25! 2 

20 5 

1025 


81 5 . 

-220 5 

27 1 

1920 

- u 1 

85 5 I 

•213 0 

27 7 

1927 

“ 5 \ 

80 5 

- 182 5 

28 2 

I02H 

■■ 1 . 

80 4 

" 145 6 

28 8 


- :i ' 

lO.O , 

]21.8 

29 4 

1030 

™ 2 

85 S 

- 71 0 

80 0 

19:^1 

t 

28 0 ; 

- 28.9 -3.920.7 

80.6 

1982 

9 ' 

21 2 1 

0 

81 2 

1988 

} 

IS 7 1 

IS 7 

81 8 

1981 

2 : 

24 8 1 

48 6 

82 4 

1985 

8 ' 

2< j i 

70 2 

88 0 

1980 

! 4 , 

28 5 

114 0 

88 0 

19.’)7 

! 0 

32.1 

100 5 

84 2 

198S 

: 0 

2.'. V 

152 4 ' 

84 7 

1989 

' 7 

*25 0 1 

179 2 ' 

8.8 8 

1919 

S 

20 9 : 

215 2 i 

85 9 

1911 

9 

27 7 , 

219 8 ! 

80 5 

1942 

' 10 ' 

2i') « 

257 0 1 

87 1 

1988 

' 11 i 

'18 i , 

80 4 1 

87 7 

194 4 

12 ' 

12 0 

504 0 1 

8S 8 

1945 

; 18 

49 0 

0.87 0 

'88 9 

1940 1 


51 8 ' 

707 2 

89 5 

1947 

: 

50 8 ! 

702 0 i 

40.0 

1948 

; 10 

47 8 1 

704 8 ! 

40 0 

1919 

ir 

18 H ; 

741 0 0.014 8 : 

41 2 

1950 i 

I 18* 

45 8* i 


41 8 

1951 

1 D>* ' 

1 i 

48 1* 


42.4 

1952 

1 90* . 

•IS 3* ! 


48 0 

1958 

1 21 ^ ! 

50 5’ ; 


48 0 

Total 

1 o' ^ : 

1 .002 i ; 

2.094 1 ' 



* Not used for eomputiui? tiond 

Data fiom various isMiies i»f the Survey of Cu’rtnt Bumness. 
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that variable. Now, we can choose any year we wish, and in Table 12.2 
it may be seen that the X origin was taken at 1932. By taking the origin 
at 1932, the middle year, we have caused the sum of the X values to equal 
zero, with the result that the two normal equations may now be written: 

T. 

iL = hi:x\ 

Now, normal equation I gives the value of a and normal equation ll 
yields the value of b Table 12.2 shows the computation of 2T and of 
XX Y. X is obtained by counting the number of years or by subtracting 
the first year from the last and adding one. The value of 2 could have 
been computed in Table 12.2. However, this is never necessary for a 
time series problem, since the sums of the squares of a series of natural 
numbers (1, 2, 3, . . .) may be read from Appendix B or computed by 
means of the formula given in that Appendix. The sum of the squares 
of the first 17 natural numbers is seen to be 1,785 in Appendix Ji, so, for 
the magazine advertising data, 2A^ — 2(1,785) ~ 3,570. We may now 
substitute in the tw'o normal ecpiations, obtaining 

1, a 
IL b 

The trend equation is 

Yc = 31.2 + 0.59X. 

Origin, 1932. A" units, 1 year. 

The trend values for each year are shown in the lavst column of Table 
12.2. An individual trend value is obtained by substituting the appro- 
priate X value (wuth sign) in the trend equation. When trejid values for 
all of the years are wanted, they may be obtained most expeditiously 
by placing the a value of 31.21 million agate lines opposite 1932 and 
repeatedly adding the value of b for the years 1933 1953. For 1931 to 
1915, the value of b i.s repeatedly .subtracted from the 1932 trend value.® 
The trend of the series is shown in Chart 12.3. Since two points deter- 
mine a straight line, it w^as drawn by plotting the trend values for 1915 

• The repeated additions may be made on a calctilating machine or, by adding and 
subtolaling each time, on an adding machine. The repeated .subtractions may be 
done similarly. If an adding machine which has no subtraction key is to bo used, it is 
best to compute first the trend value for the first year and then obtain the others by 
repeated addition. 


2r ^ 

N 

XXY 


1,092.4 


35 
2,09U 
'~3~57() 


' ~ 31.21 and 


- 0.5866. 
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and for 1949 and connecting these points. Selecting the two points well 
toward the ends of the X series results in greater mechanical accuracy 
in drawing the trend line. The trend has been extended through 1953 
although the observed values for 1950-53 were not used to obtain the 
trend. This is a customary procedure, since it is not practical or desirable 
to recompute a new trend each year. Furthermore, it is not desirable to 
have too many high or low values at the ends of a series. At a later point 
in this chapter, it will be explained that, particularly for short series, a 
^rend should be fitted to data which begin and end with approximately 
the same stage of a cycle. Since the trend for magazine advertising was 
fitted to a period of 35 years, this consideration is of minor importance. 
The effect of excluding some of the early years or of including the data for 
1950-1053 will be commented upon toward the end of thjKS chapter. 

Chart 12.1 showed a straight-line trend fitted by inspection which was 
found to have the equation 

F, - 31.6 + 0.68 F, 

with origin at 1932 ami X units 1 year. The least-squares trend enuatiou 
was 

Fc - 31.2 + 0.50X, 

with the same origin and A" units. Note that the Uvo equations differ 
very little in regard to their a values, but that b for the inspection trend 
is larger. It is not to be expected that the two should agree. It has 
already been noted that the criteria of fit for the two methods are different. 
Furthermore, the criterion of equal areas for inspection fit is not 
applied mathematically, but visually, and is therc^ )rc subject to errors of 
judgment. 

Even number of years. It may have occur’-ed to the reader that the 
time-saving device of taking the origin at the middle year might fail us 
when it becomes necessary to deal with an even number of years. As a 
matter of fact, we can continue to use the short forms of the normal equa- 
tions but we shall (1) take the origin between the tw^o middle years and 
(2) state the X values in terms of half-years. This has been done in 
Table 12.3, in which the computations are performed for fitting a straight- 
line trend to the production of sweet potatoes in the United States for 
1931-1952. The data are shown g aphically in Chari 12.5. 

In Table 12.3 the origin was taken between 1941 and 1942. From this 
origin it is one-half year {X = 1) to the middle of 1942 and one-half year 
(X = — 1) to the middle of 1941. There is, of course, an interval of two 
half-year periods betw'een any two adjacent years; therefore, 1940 is 
shown as —3, 1943 as 3, and so on. As before, the value of need 
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not be obtained by squaring and summing the X values. The sum of 
the squares of a series of odd natural numbers (1, 3, 6, . . .) may be read 
from Appendix C or computed by means of the formula given in that 
appendix. From Appendix C the sum of the squares of the first 11 odd 



1*31 '34 <37 <40 -43 '46 <49 1952 


Chart 12,5. Production of Sweet Potatoes in the United States, 1931-*-1952, 
and Trend as Shown by a Straight Line Fitted by the x\lcthod of Least Squares. 
Data of Table 12.3. 


natural numbers is seen to be 1,771, so SY* = 2(1,771) = 3,542. We 
may now solve the two normal equations for a and b : 


I. 



1,331 .4 

22 


60.5. 


SXr ^ -3,45 9.4 
SY* “ 3,542 


-0.98. 


And the trend e<]uation is 

Y, = 60.5 - 0.98X. 

Origin, 1941-1942. X units, i year. 

This trend is shown on Chart 12.5 by a broken line. 

Note that the trend for sweet potato production has a downward slope. 
The sign of b in the trend equation is obtained as a result of the computa- 
tion of HiXY, being negative when this sum is negative and positive when 
this sum is positive. 
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Before leaving this illustration, it may be in order to point out that if 
production data for sweet potatoes over a longer period, say, 1909-1952, 
were to be considered, the trend would not be a straight line. For those 
years, the trend would be .slightly curved and concave downward. 


TABI.E 12..1 

Computation of Values for Fit of Straight Line to Data of Pro- 
duction of Sweet Potatoes in the United States, 1931-1952 


(Millions of buahelB of npproximatply 55 pouncLi) 


Year 

X 

V 

.ry 

Trend values 

Vc 

193] 

-21 

67 3 

-1,413 3 

81.1 

1932 

-19 

86 6 

-1.645 4 

70.1 

1933 

-17 

74 6 

•1,268 2 

77.2 

1931 

-15 

77 7 

-1,165.5 

75 2 

1935 

-13 

81 2 

-1,055 6 

73 2 

1936 

-n 

59 8 

- 657.8 

71.3 

ton? 

- 9 

68 1 

- 612.9 

69. 3 

193tt 

- 7 

68.6 

- 480 2 

67.4 

]939 

— 5 

61 7 

- 308 5 

65 4 

im) 

- 3 

51 7 

- 155 1 

63 4 

1911 

- 1 

62 6 

62 5 -8,825.0 

61 5 

1942 

1 

65 5 

66 5 

1 59 5 

1913 

3 

71 1 I 

213 3 

57 6 

1944 

5 

68 3 i 

341 5 

55 6 

1946 

7 

61 3 

429 1 

53 6 

1946 

9 

60 8 j 

547 2 

51 7 

1947 

11 

49.6 

515 6 

49 7 

1948 

13 

43 1 

560 3 

47 8 

1949 

15 

45 0 

675 0 

45 8 

1950 

17 

49 8 

816 6 1 

13 8 

1951 

19 

28 8 

547 2 

41 9 

1952 

21 

28 3 

j 594 3 5,3()3 6 

39 9 

Total 

' 6“ 

"i’aai i ' 

! -3,459 4 



Data from U. S Deparimcnt of AKTjc.ilture. Aijncultural Statiatia, IGfliS. ,» SOS and 
1G6£ Aunuaf Summary Acrt-nyc, Yield, and Produ<t\on of Pri'*iexp<il Cropa, p. 37. 


ADAPTING EQUATIONS TO A MONTHLY BASIS 

In the preceding illustrations, trend lines were ntted to annual, rather 
than to monthly, lata. The process of fitting a straight-line treinl to 
monthly data is no different from that of fitting to annual data, but there 
are 12 times as many observed values to be considered and, because the 
X values become larger, the labor k multiplied by more than 12. It is 
tlierefore advisable to fit a straight-line trend to annual data and then to 
transform the trend to a monthly basis. The result is ordinarily the same 
as if the trend had been fitted to the monthly data. In some cases, it is 
preferable to obtain the trend from annual data, since the presence of a 
very violent seasonal movement may distort a trend fitted to monthly data. 
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Annual totals — X units, one year. The trend for ine annual data 
of magazine advertising for 1915-1949 was found to be Yc ~ 31.^ + 0.59Y 
with origin at 1932 and with X units of one year. The basic data were 
in terms of millions of agate lines of advertising per year; each figure, 
therefore, w’as a total for the year to which it referred. 

The value obtained for a (to four digits) was 31. 2 i millions of agate 


lines, and a 



31.21 was the arithmetic mean of the 35 figures for 


the years 1915 1949 Since the figure 31.21 w*as the a value for annual 
totals, the a value in monthly terms would be one-twelfth ot it, or 2.6008 
millions of agate lines. 

From the annual data, b wjis found to be 0.5866 millions of agate lines. 
Now this is the annual irn-Tease in the amount of magazine advertising for 
an entire year. If we di\dde by 12, we obtain the monthly trend increment 
in the yearly totals. Sin<;e we still have yearly totals, wo must divide 
again by 12 to reduce the figures to millions of agate lines per month. 
We perform both of these operations at once by dividing by 144, giving 
a month!}'' b value of 0.5866 -r- 144 — 0.004074 of agat#^ lines. The 
eijuation in monthly tenns is 


Y, - 2.6008 + 0.004074 Y. 

Origin, June-.fuly 1932. X units, 1 month. 

Our adjustment is not quite completed. Owing to the fact that there 
are an even number of months in a year, the equation just obtained has 
an origin which falls l)etween the two middle months and is therefore out 
of step with the original monthly data by one-half month.' Conse- 
quently, we must shift the origin from a point between two months to any 
convenient month. Let us shift it to July 1932, This merely calls for 
increasing the value of a by one-half of the monthly b value, or (0.5 X 
0.004074) == 0.002037. The value of b remains unchanged. The new 
equation, then, is 


7c - 2.6028 + 0.004074AL 
Origin, July 1932, X units, 1 month. 

We shall record only four digits when we use this equation to obtain 
monthly trend values in Table 16,3. 

^ Thia will always be tnie, irrespective of whether the original data wer(^ first-of- 
the-month, raiddle-oLthe-mouth, end-of-t’ie-inonth. or any other sort. It would not 
occur if a 13 -raonth year were used. 
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Annual totals— X units, one-half year. When a straight-line 
trend wag fitted to the production of sweet potatoes for 1931-1952, the 
resulting equation had X units of i year because the data covered an even 
number of years.® It would nob be particularly meaningful to reduce the 
annual trend equation for sweet potato production to a monthly basis, 
because sweet potato production does not take place every month in the 
year. Neither is an illustration necessary here, since the procedure is 
exactly the same as that just described except for the fact that b is divided 
by 6 X 12 = 72 instead of by 144. This is so because the b value in the 
annual trend equation refers to the increase taking place in the trend 
during each six-month period. 

Monthly averages — X units, one year. If a straight-line trend has 
been fitted to animal data which are monthly averages for each of an odd 
number of years, it is merely necessary to divide the annual b by 12 and 
shift the origin so that it will he compatible with monthly data. Suppose 
that a " ’ for the yt^ars 1928 -1952 has been obtained for the production 
of a manufactured commodity, the annual tT-end equation being 

Vc -- 2,430 -4- 24.0X. 

Origin, 1910. X units, 1 year. 

Since the original data were monthiy averages for each year, the value 
of a does not need to be adjusted. The value of b represents the annual 
increase and must lie divided by 12 to obtain the monthly trend incre 
ment. The monthly trend eciuatioii then is 

Yc = 2,430 + 2.0X. 

Origin, June July 1940. X units, 1 month. 

To complete the adjustment, w'e must shift the origin of the equation 
so that it will coincide wnth a rounth in.steaJ of falling heiween two 
months. If the origin is shifted to June 1940, it is merely necessary to 
decrease the value of a by one-half of the value kS the monthly 6, giving 

} , = 2. 129 f 2.0;v , 

Origin, June 1940. X units, I mouth 

Monthly averages - X units., o-. e-half year. The procedure is the 
same avS that just described except that the semiannual is dis ided by 6, 

* An annual treiiJ ecjunlivui, sucti as tbac for the prodiietiou of potatoes, 

could be shifted so that the A" units \v<mld bo 1 year iusteiui of ouo-hulf year. This 
merely requires doublinj^ the value of h. However, it would also be necessary to 
shift the origin so tb i it would fall on a year instead of between two years. 
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The foregoing discussion of the procedure for shifting annual straight- 
line trend equations to a monthly basis may be summarized for purposes 
of reference as follows: 


X unit in 

1 Type of data 

annual 

1 Monthly averages 

1 Annual totals 

equation 

a 

b 

a 1 

b 

One year | 

No 

! change 

Divide 
by 12 

Divide 
by 12 

Divide 
by 144 

One-haif j 

No 

Divide 

Divide 

! Divide 

year \ 

change I 

! by 6 

by 12 

1. 


Under all circumstances, the origin must be shifted so that it falls on a 
month instead of between two months. 

SELECTING THE PERIOD FOR TREND ANALYSIS 

In general, it is desirable to use as long a period as possible when a trend 
is to be determined. This practice results in a more reliable statement 
of the trend and one which is less affected by one or two large cyclical 
movements. 

If the nature of the trend of a series has changed, it may be m^ccssary 
to use two trends. It may or ina}^ not be possible to splice the two 
trends together. The depression of the 1930’s was so severe that, for 
some series, it nowseems to have been more in the nature of a readjust- 
ment. Consequently, one may occasionally use one trend for the yeiirs 
before the readjustment but a different one for the years following the 
readjustment. It would have been possible 1o fit two trends to the data 
of magazine advertising, shown in Chart 12,3, but we chose to show those 
data in terms of a single trend covering a longer period of time. 

It is important, that the first few and the last few years of a series be 
given special consideration before a decision is made concerning the 
period to be used. If the data cover only ten or fifteen years, this is of 
particular importance; for longer periods, it is less important. The first 
year should not be one of depression and the last year one of prosperity, 
since that will cause an upward trend to be too steep; b will be too large. 
Conversely, if the first year was one of prosperity while the last year was 
one of depression, the slope, if upward, would not be steep enough; h 
would be too small, 'fo avoid tiie introduction of such extraneous factors 
in the slope, the first and last years should be on opposite sides of the cycle 
(not on opposite sides of the trend) and about the same distance above, 
or below, the trend. Thus, in Chart 12.6 CD = C'Z>' aud a trend fitted 
to data extending from D to D' will have the correct slope. 

Not only should the elope be correct, but the level of a trend should also 



Chap. 12] 


THE STRAIGHT LINE 


279 


be suitable. If a trend were fitted to the data of Chart 12.6 running from 
Z> to D', the level of the trend would be too high. The trend should be 
fitted to a period running from B to B'. This would result in a proper 
level for the trend, since the areas ABE and A'B'E' are each one-fourth 
of a cycle. The first and last years should not both be low points of 
particularly deep depressions, since they would then lower the level of the 
trend; a would be too small. Conversely, the end years should not both 



Chart 12.6, Cycles and Appropriate Trend. 


be high points of marked prosperity, since they would then raise the level 
of the trend unduly. 

The trend for magazine advertising was fitted to the years 1915-1949. 
Although, as may be seen in Chart 12.3, the series does not begin and end 
with the same phase of a cycle, the trend is satisfactory because the period 
covered is relatively long. What changes would occur in the trend 
equation if some of the early years had been omitted or some of the later 
years included? The equation obtained earlier for the period 1915-1949 
was 

Yc = 31.2 -b 0.59X, 

with origin at 1932 and X units 1 year. Continuing to use the same 
origin and X units, the reader can verify, by computations based upon 
Table 12.2, that if the first four years were omitted, the trend equation 
for 1919-1949 would be 

r. = 31.8 -1- 0.50X. 


In view of the imles laid down in the preceding paragraphs, 1919-1949 
may be more appropriate than 1915-1949 as the period for which a trend 
should be determined. However, owing to the length of the series, the 
results differ little; the 1919-1949 equation, if drawn on Chart 12.3, could 
be distinguished ^rom the 1915-1949 trend only toward the ends. 
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If the last four years were to be added, the trend equation for 1915-1953 
would be 

Yc == 31.6 + 0.67Y. 

This equation, too, if drawn on Chart 12.3, could be distinguished from 
the 1915-1949 trend only toward the ends. 

SELECTING THE TYPE OF TREND 

Since the discussion, so far, has been limited to trends fitted by inspec- 
tion and to straight lines fitted by the method of least squares, there is 
not much that can be said at this point concerning the selection of the 
type of trend. We shall be in a better position to consider which one of a 
number of possible trend t 3 "pes is most appropriate after some additional 
types have been described in the following chapter. 

As a first step, the original data should always be plotted and examined. 
It may even be worth while to sketch in a tentative trend by inspection. 
In some instances a trend fitted by inspection may suffice; but when the 
trend itself is to be studied, or extended, a mathematical equation should 
be used. If examination of the charted data indicates that the trend is 
not linear, one of the trend types described in Chapter 13 be appro- 
priate. The trend type* chosen should be one which is logi('al in relation 
to the series wliich it undertakes to describe and in relation to the forces 
affecting that series. It is for this reason that a straight line, which indi- 
cates a constant amount of.ifK.-rease or decrease, cannot be exjiected to 
constitute an appropriate trend of a series over an extended period of 
time. 
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Symbols Used in Chapter 13 
a: a constaufc in various trend equations. 

A : a constant in an ortho|;»onal polynomial of t he first, or higher, degree, 
b: a constant in various trend equations. 

B: a constant, associated with Xi, in an orthogonal polynomial of the 
first, or higher, degree. 

c: a constant in a polynomial of the seconrl, or higher, degree. As a sub- 
script, c distinguishes a computed value from an observed vahie; 
see Yc. 

C: a constant, associated with A 2- in an orthogonal polynomial of the 
second, or higher, degrees. 

d: a constant in a polynomial of the third, or higher. degref\ 

D: a ( .''’^sl^ant, associated with in an orthogonal polynomial of the 
third, or higher, degree. 

c: a constant in a polynomial of the fourth, or higher, degree. 

/: a constant in a potynomial of the fifth, or higher, degree. 
k: the asymptote of an asymptotic growth curve. 

k^^) ki, k2: When one logistic curve is built upon part of ar>()ther, /co is the 
upper asymptote of the first logistic curve and ki and /v> are, respec- 
tively, the lower and upper asymptotes of the second logistic curve, 
a: lower-case Greek mu, used to assist in determining the trend values 
for a logistic curve. }m — 

n: for a modified exponential or a G ^mpertz , the riU7ii[)er of years 

in each third of the scries; for a logistic curve, tx < number of time units 
hetveen :ru and .ri, or between .jq and 
N\ the number of items in a series, 

r: a subscript of X in an orthogonal polynomial; it may have a value of 
1, 2, 3, etc. 

: lower-case (Ireek sigma, meaning “take tlte sum ol 
1, 2?, 2^: respectively, the sums of values for rb. first, second, and third 
equal parts of a series. 

, Xiy x^: when fitting a logistic curve, the j ears associated with yo, yi, 
and ?/2. 

: a value of the X series. 

1, X2, Xz, etc.: variables in orthogonal polynomials. 

2/0, 2/1, 2/2: the three selected Y values used for fitting a logistic curve. 

Y: an observed value of the Y series, 
ib: a computed value of the Y series. 

!; factorial. 5! - 1 X 2 X 3 X 4 X fi. 
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CHAPTER 13 


Analysis of Time Series: 

SECCLAR TREND il NON-LINEAR TRENDS 


Chapter 12 discussed oiAy .‘^inlplest type of trend equation, the 
straight line. It was noted thui, for .'^hort periods of time, a straight line 
may provide a reasonably good rietu ription of the trend of a series, but 
that for longer periods a cur\ ed line of some sort may be calle<l for. This 
chapter will describe the properties of several nondinear equation types, 
Avill explain how to fit them, and will give some indication of how to 
proceed in choosing among the various trend types. 

SIxMPLE POLYNOMIALS 

This family of curves has as its most elementary representative the 
straight line, which, it will be remembered, has two constants. The 
straight line and four other polynomials are shown below^: 

First-degree (straight line). I'c = a + 

Second-degree a + fe-V + 

Third-degree. Fr - n + 6A" + cX" t 

Fourth-degree hX -f cX - dX^ + ('X*. 

Fifth-degree , Yc ^ i- f^X -f rX^ + dX^ + + fX^, 

When a third constant is added to the eci- tion for the straight line, 
the second-degree curve, which htis one bend, is obtained. Because of the 
bend in the second-degree curve, the slope of the curve is continually 
changing. If a sufficient number of X values are included, the second- 
degree curve will have a positive slope in one portion and a negative slope 
in another. This may be observed in Chart 13.1, which shows eight 
second-degree curves. 

Each constant added to the second-degree equation may introduce an 
additional bend in the curve. Thus, a third-degree curve may have two 

3^2 
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bends, as shown in Chart 13.2. I'he lower of the two curvc i in Chart 
13.2 shows clearly the fact that the slope of a third-de^ree curve may 
change twice from positi\'e to negative or from negative to positive. 
Since such a change in the direction of sloj^e nia 3 ^ oc< nr three times in a 
fourth-degro<‘ curve and four times in a curve, it follows that 

fourth- and tiflh-degree curves hardly coincide with the concept of secular 
riT-ud Vrddeh is of intcre.st to Consequently, we shall give no further 
a1 tention to fourth- and fifth-dt‘grec curves, but shall describe the process 
of fitting I/he sei'ond-degree curv^e in some detail and briefly' consider the 
third-degree c ur ve . 

Second-degree curve. The second-degree curve is onl}’' a little more 
complicated than a straight line, since it involves merely the addition of 
cX'^ to the equation for a .straight line, giving 

r, - u -i- bx -+ 

Th? second-degree equations, which have been plotted in Chart 

13.1, give some idea of the flexibility of tliis equation type. Portions 
of such a curve fitted to a time series may slope upw'ard or downward 
(or upward in one portion and downward in another) and may be concave 
upward or concave downward. While a straight-line indicates a coii- 
.stant amount of increase or decroa.se, a .second-degree curve involves 
increasing or decreasing amounhs of increase or decrease. More specifi- 
cally: the second difTerences of the values obtained from the exprcs.sion 
Tr “ a + hX + cX^ are constant.^ 

Fitting the second-degree curve. Since there are three constants or 
unknowns in the second-degree curve, the follow*. ;g three normal equa- 
tions are required: 

I. IF = Na + hXX + eSA'^ 

II. ZXY = alX + d cIX®. 

III. 2X‘^F = a2X2 + 62A'^ 4- c2X^ 

HoAvever, we are dealing with a time serie.s, and the origin may be taken 
at the middle year (or other time unit), or between the two middle years, 


‘This ina 3 " be seen by cousidering the Yc values for section 2 of Chart 13.1, for 
which the equation is Fc ** —1 T 2X — O.'SX*: 




First 

Second 



First 

Second 

X 

Fc 

difference 

difference 

X 

Fe 

difference 

difference 

-3 

-9.7 



2 

1.8 

-1.1 

-0.6 

-2 

-6.2 

-3.5 


3 

2.3 

-0.5 

-0.6 

-1 

-3.3 

-2.9 

-0 6 

4 

2 2 

0.1 

-0 6 

0 

-1.0 

-2,3 

-0.6 

5 

1.5 

0.7 

-0.6 

1 

0.7 

- 1.7 

-0.6 

6 

0.2 

1.3 

-0.6 
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as before, with the result that the summations of all odd powers of X 
are zero. Therefore, the three normal equations become 

I. sy = iVo + cSA*. 

11. SYF = b2X\ 

III. 2X=y = aSY» + cSA'^. 

Notice that, instead of having to solve three eqiiation.s .simultaneously, 
the value of b is obtained from Equation II, while the values of a and c 

mllions of 

SHORT TONS 

JO , 1 , 1^ , 5 , , , j , , , 



1935 1939 1942 1945 1948 1952 

Chart 13.3. United Stales Production of Crude G>psum, 1935-1952, and 
Trend as Shown by a Second-Degree Curve. Data of Table 13.1. 


are gotten by solving Equations I and III simultaneously. The use of the 
middle year as the origin has enabled us to save much labor. 

Table 13.1 and Chart 13.3 show the production of crude gypsum in the 
United States for the years 1935 to 1952 inclusive. The upward trend 
of the series is not linear, and tlicse data will form the basis of our illustra- 
tion of a fit of a second-degree curve. The three normal equations call 
for the numerical values of N, SK, ZXYy and which may be 

obtained from Table 13.1, and the values of and XX* (for the first 
nine odd natural numbers), which may be read from Appendix C, Sub- 
stituting Wi the three normal er|uation.s gives 

I. 88.150 - 18a + l,938r. 

II. 358,287 - 1,9386. 

III. 10,080,943 - C938o + 374,034c. 



Chak 131 NON-LINEAR TRENDS 287 

TABLE 13,1 

Computation of Values for Fit of Second-Def^ree Curve to Production of 
Crude Gypsum in the United States^ 1935-1952 


of fhort ton**) 




Prodiic- 



Computation of trend values 

Year 

X 

tion 

A“]' 

X^Y 




Trend 



Y 




a + bX 

cA* 

value 









Yc 

1935 

-17 

1,881 

-31, 97 V 

543,009 

280 

i ,371 ,3 

1.029.6 

2,401 

1936 

-15 

2,f>7(i 

- in, 1 11 ) 

GOl' , 100 

225 

1,741 0 

801 6 

2 , 543 

1937 

-13 

3.014 


509.3(>0 

ico 

2.110.8 

602 1 

2,713 

1938 

- tl 

2,f>7l 

-‘V' -M 

101 

121 

2.480.5 

431 1 

2,912 

1939 

- 9 

3,190 

- ? - ’ 

25S 876 

8! 

2,850,3 

288 6 

3,i:39 

1940 

- 7 

3 , 604 


179,5:93 

49 

.1,220.0 

174.6 

3,395 

1941 

- 6 

4,706 

‘J.y . 53 0 

117,650 

25 

:C589 8 

89 1 

3,679 

1942 

- 3 

4,634 


41,706 

u 

? 950 5 

32 1 

3,992 

1943 

- 1 

3,910 1 

1 

3,919 i 

1 

4,329 3 

3.6 

4,333 

1944 

1 

3,754 

. ' ' 

3,751 

1 

4,699.0 

3 6 

4,703 

1915 

3 

3,802 

11 , h)0 

34.218 

9 

5.068 8 

32 1 

5,101 

1046 

u 

5,615 

•28 07,5 

140.375 

25 

5,438 5 

89 1 

5,528 

1947 

7 

6,198 

43 , .i8t' 

30,3 702 

40 

5.808 3 

174 6 

5,983 

1948 

9 

7,041 

03,396 

570 , 664 

81 

0.178 0 

288 6 

6,467 

1949 

11 

0,491 

7 1 , lOi 

785,411 

121 

0,547.8 

431 1 

6,979 

1950 

13 

8,110 

105.547 

1,372,111 

169 i 

6,917 5 1 

602 1 

7,520 

1951 

15 

8 . 705 

1 <30, 575 

: 1.958.025 

225 1 

7,287 3 1 

801.6 

8.089 

1952 

17 

8.070 

i:;7,]9U 

1 2,332,240 

289 i 

7,657 0 1 

1,020 6 

8,687 

Total 1 

' “ o' 

88.159 

358 2S7' 

1 10,080,043 


i 




Data from U. S. Dopartinent of Cot»,nt , OiTii' 'i f.. Economics, Business Statistics, 1063 

Biennial Edition, p. 185. 


The value of h is given by th^' f^econd normal equation: 

1. mh - 358,287, 
h - 184.875. 

Next, the values of a and c are olUained by solving normal equations 1 
and III simultaneously. The sfcep.s are: 

1. Multiply normal equation J by 193 and subtract normal equation III 
from this new form of nonnai ctiuaiion 1 , thus ol>taining‘ tlie value of a. 

(I X ^93). )/, 014,987 - 3,474(7 4- 374,034c. 

II. l(),08Ch943 - 1,938^1 374 ,034c, 

0,933,744 = 1.53'6a. 

a 4,514.15625. 

*The multiplying factor 193 was obtained by dividing the coefficient of c in nor- 
mal equation III by the coeHicient of c in normal equation I. That is, -r 2A’ 
» 374,034 1,938 “ 193. When solving two equations simultaneously, cither 

unknown may be eliminated by multiplying one of the equations by the quotient of 
the coefficients of the unknown which is to be eliminated and subtracting one equation 
from the other. 



288 


ANALYSIS OF TIME SERIES 


[Chap. 13 


2. Substitute the value of a in normal equation I to obtain the value of e. 

\ 

I. 88,159 = 18(4,514.15625) + 1,938c. 

1,938c = 6,904.1875. 
c - 3.56253225. 

3. Substitute, in normal equation III, the values obtained for a and c 
This serves as a check of the computations in steps 1 and 2. 

III. 10,080,943 = 1,938(4,514.15625) + 374,034(3.56253225). 

= 10,080,943. 

The second-degree trend equation may now be written: 

Y, = 4,514.16 + 184.875Y + 3.5625A'*. 

Origin, 1943-1944. X units, i year. 

The computation of the trend values is .shown in the la.st four columns 
of Table 13.1. The trend, shown in Chart 13.3, is the result of plotting 
these trend values. Note that the production of crude gypsum seems to 
show four c3’’cles during the years 1935 -1952 

THIRD-DEGREE CURVE 

By adding one more constant to the equation for a second-degree curve, 
we are enabled to put one more bend into the curve. While a straight line 
has only one slope, a second-degree curve (Chart 13.1) slopes in a positive 
direction at one stage and in a negative direction at another, and a third- 
degree curve fChart 13.2) may include three directions of .slope. 

Four normal equations are required for a third-degree curve: 

I. 21’ --= .Va + h2A' + c2X* + d^X\ 

II. 2AK = a'SX + 62X2 q. ^SX* + d2Xh 

III. 2X2F = a2X2 + 62X» + rXX* + d2X\ 

IV. 2X+' - a2X’ + 62X« + c2X‘ + <i2X«. 

Again, if the .V origin is taken at the middle of the period, the odd powers 
of X will total zero, leaving these equations: 

I. 2F = Xo + c2X2. 

IT. 2Xr = 62X2 + d2X< 

III. 2X2 F „ a2X2 + c2X\ 

IV. 2X2F = 62X* + dXX*. 
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With the equations in this form, we do not have to solve four simultaneous 
equations, although that would have been necessary if the origin had been 
taken anywhere other than at the middle of the period. The values of 
a and c are obtained by solving Equations I and III simultaneously; 
simultaneous solution of Equations II and IV gives the values of b and ri. 
Only one column of figures, in addition to those shown in Table 13.1, must 
be computed; it Is a column headed the total of which gives T^X^Y, 
Note that Equations I and TIT are exactly the same as for the second- 
degree curve. Conseciuently, for a given set of data, the values of a and c 
will be the saiiie for a second-degree curve and for a third-degree curve. 

Orthogonal polynomials. A minor disadvantage of polynomial 
equations of the type described is that each addilional constant added 
to an equation requires that some of the constanl previously obtained 
be aliandoned and new constants computed to take their place. Thus, 
a secomi-degree curve uses the same value for b as a straight line, but 
reari' r^^ different value foi a; a third-degree curve uses the same values 
for a ancl c as a second-degree curve, but requires a new value for b; 
a fourth-degree curve uses the same values for h and d as a third-degree 
curve, hut new values muftl be cali'ulatod for a aiul c; and so on. Orthog^ 
onaJ poh/notnidl eqiiatjuus i nv olve a trans f ormation of such a nature 
that, as n e\v~c(mstaj)ts^^^a^^ liclTird^the o hi co nstants nuhamTTiFsjime!” 
Such equations are \ery convenient to use, since we inerelylniilfl up ouE 
eiiuaiion by adding rnMv constants until a satisfactory fit is obtained and 
simultaneous solution of (‘quation.s i.s avoided. There is thus no lost 
motion, and the labor involved be(‘ome.‘J progressively less than that 
required To fit a cuivo by the ordinary method for equations of third 
degree and liighor. The obtained b ’ the two methods 

sjiauL 

Although ttio lalnjr recjuired for litting is modest, tli theory of orthog- 
onal polynomials^" is beyond the scope of thi.s text, ami will not be ex- 
plained here. Whereas the ordinary third-degree pohuiomial is of the 
type 

}'*, - a -f bX -f cX^ -b dX\ 
the orthogonal polynomial is 

}\ - A -f BX, + CA'o -f DX,. 

In working ^^ith orthogonal polynomials, the A origin is conveniently 
taken at the midille, so that SX = 0, If N is odd, the Y values are 
taken as ■ * * —3, —2, — 1, 0, -fl, 2, +3 • * • in the usual fashion; 
if N is even, they are taken as ■ • ' —2.5, —1.5, —.5, -f .5, d-1.5, +2.5 

* See R. A. Fisher, StaiUt(.c(il Methods for Research Workers, Oliver and Boyd, 
Edinburgh^ 1936 (sixth edition), pp. 14y-15U, and Hafne'r Publishing Co., New York, 
1950 (eleventh edition), pp. 147“153. See also R. A, Fisher and F. Yates, Stalisttcal 
Tables for Biological, Agricultural and Medical Research, Hafner Publishing Co., New 
York, 1949 (third edi^ b>n), pp. 23-25 and 70-80. 
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• • • . The variables Xi, X%f Xj • • • are derived from the momenta 
m the X series. In form easy to use, these are: 


Xi = X. 


X2 - XI - 
X 3 = XI - 


12 


3X* - 7 

20 " 


Xi. 


Xr - XiX(..0 - 


4[4(r - \y - n 


^(r-2). 


N is, as usual, the number of items in the series — the number of years or 
months — and r is the subscript of the X under consideration. Each of 
these equations is worked out, and in the computation table there will 
be column headings for A"-., X 2 , and X3. The constants A, C, and 
D will be obtained as follows: 


A 


"v * 


12 


IbO 

✓7 ^ * _ "V !/■ 

~ 4'") ‘ 

n - ^ ^ vv r 

- 1){A*2 -- 4)0V^ - li) ' ’ 


Coefru'icut Xr -~ 


(2r)!(2r+l)’ 
fr bi ‘.V( A“‘‘ ' 1 ; ( 4 ) " • 


(X’^ -- r^) 


ZXrY. 


In obtaining tb(* treiid values, the constants are multiplied by Xi, X 2 , 
and A '3 instead of A", A"^ and X^. 


L.SE OF LOGARITHMS 

Straight line lilted to logarithiiiH. A glance at Chart 13.4 makes it 
quite apparent that a curve of the type a + bX would not be a Ksatis- 
faetory description of the trend of the production of asphalt tor the period 
shown. A second-degree curve might be used, but a more logical trend 
eciuation is available. A second-degree curve fitted to this series ^vould 
behave in such a fashion tliat the amount of increase each year would be 
increasing by a constant amount; this is the same thing as saying that the 
second ditTenuKai of th(‘, trend values is a constant, but with the additional 
provisos (1 ) that the trend is upward and (2) that the second differences 
are positive. Xow, a curve of tlie type Yc = indicates a constant 
ratio of change, and, if sucli a curve w'ere to be fitted to tlie data of Chart 
J3.4, it is clear that the ratio would be greater than LO rather than less 
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than 1.0. That is to say, the series is increasing. The data of asphalt 
production have been plotted on semi-logarithmic paper in Chart 13.5, 
and it appears that the trend, which was not linear in Chart 13.4, is now 
linear. This indicates the suitability of the equation type Yc = tlie 
exponential curve. 



Chart 1.3,4. United States Production of Asphalt fr > n Petroleum, 1941- 
1952, and Trend as Sho>^n b> a Straight Line Fitted to Logarithms of the 
Data. Note that this i*hart has au arithmetic vevta'al scale and that the trend line ia 
slightly curved. Data of Table 13.2 

It is not possible to fit the exponential curve directly to the Y value> 
by least squares; w'o can, however, make a least-squares fit to tlie 
rithms of the original data/ and this results in miinmizing the . (juai’cd 
deviations of the logarithms uf the observed values from th(‘ luganihrmc 
trend values. Putting the exponential equation in logarithmic loj ni gi vi s 

log n = log a ’ X log 6, 

which is a straight line in terms of X and log Y. The normal equations 


*This ttquation may be fitloii to the F values by a method described \n James W 
Glover, Tables oj Applird Matht mattes in Finance, Insurance, (ie(U’i;e 

Wahr, Ann Arbor, Mich. 1923, pp 468-481. Glover’s method results m a and b \aiue.^ 
such that - ^Y aim S.YFr = 2A'F, with tlie Oijgin taken at the first year. It is 
not a loast-squaies lit. 
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Chart 13.5, L’nitcd States Production of Asphalt from Petroieurri, 1941- 
1952, and Trend as Shown by a Straight Line Fitted to the Logarithms of the 
Data. Note that this chart has a iogarithiiuc vertical scale and that the trend is 
linear. Data of Table 13. 2. 

are 

I. 2 log Y “ ;V log a -f- log 62X. • 

II. 2 ATog Y ~ log a^X +■ log bZX'^. 

Since the -Y origin may be taken at the middle of the period, 2 A = 0; so 
these equations may be written 

I. 2 log Y - N log a. 

II. 2 Ylog Y - log /)2Y^ 

Using the summations shown in Table 13.2 and getting 2Y^ from 
Appendix C, we have 

I. 47,145300 - 12 log a, 
log a == 3.928775. 

II. 8.025212 = 572 log h, 
log 6 - 0.0140301. 
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log y, - 3,928775 + 0.0140301X. 

Origin, 1946-1947; X units, y year. 

To obtain a and b, we look up the anti-logarithms of log a and log b and 
we can then write the trend equation in natural form: 

Vc - (8, 487.4) (1.0328)^. 

Origin, 1946 1947; X units, ^ year. 

The log Yc values and the values for each year are shown in the last 
two columns of Table 13.2. The Yc trend values are shown on both 

TABLK 13.2 

Computation of Values for Fit of Straight Line to Logarithms of United 
States Production of Asphalt from Petroleum^ 1941 d9:>2 
(Thounanda of abort toiia' 


Produc- j I T re nil values 


Year 

.V 

' tion 

: y . 

Loi^ T' 

-V log Y 

f-og )'< 


1941 

- i; 

‘ 0 , 558 i 

3 

816771 

-41 .984481 

, 774444 

5,949 

1942 

- 9 

6,29ti 1 

3 

799065 

-C,4. 191585 

3 . 802,504 

6.346 

1913 

— i 

6.757 ' 

3 

H29754 

-26,808278 

3.830564 

6,770 

1944 

- 5 

5,996 ' 

3 

814850 

-19.224250 

3.858624 

7.221 

1945 

- 3 

7,127 

3 

852907 

-11.658721 

3.886685 

7,703 

1940 

- 1 

8,166 

3 

912009 

- 3 912009 

3.914745 

8.218 

J947 1 

i 

i 8,962 

3 

952405 

3 952405 

3.942805 

8,766 

1948 


! 0,440 

‘3 

974972 

1 11.924916 

3.970865 

9,351 

1949 

5 

! 8,910 

3 

949S78 

1 j9 749390 

> 998926 

9,975 

1950 

7 

' 10.5vS9 

4 

024855 

I 28 173985 

' 02()986 

10,641 

1051 

9 

: 12,055 i 

4 

081167 

36 730503 

4 055046 

! 11,:^51 

1952 

11 

1 12.784 1 

4 

106667 

i 45 173337 

4 083106 

12,109 

Total 

F 

i . i 

47~ 

TiSloii' 

r 8 625212' 




Data from U. S. Depurtrueut of C(*m7nerce. Otbc<* of Bu-siness EcononiuiR, Stalistira, 19,7^ 

Biennial Edition, p. 174. 

Charts 13.4 and 13.5. To draw the trend on Chart 13.5. it was merely 
necessary to obtain the 1^. values for 1941 and for 1052, to plot those two 
values, and to connect them with a st raight line. Drawing the trend on 
Chart 13.4 requires plotting all, or nearly all, of the trend values. 

The trend equation, written in the Ic m 

r. = (8,487.4)(1.0328)^, 

tells us that 8,487.4 short tons was the trend value for a point midway 
between 1946 and 1947, and that, during the period under consideration, 
** the production of asphalt had an annual growth of 3.28 per cent. Inci- 
dentally, 8,487.4 ehor; tons is the geometric mean of the Y values. Since 
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the geometric mean is always a little smaller than the arithmetic mean, 
and since the sum of the squares of the deviations of the logarithms 
(rather than of the original data) is at a minimum for this trend, it follows 
that the sum of the deviations above the trend line of Chart 13,4 is 
slightly larger than the sum of those below it . This oonstitutes a minor 
vshortcoming of this type of trend. However, the measured deviations on 
either side of the trend line in Chart 13 5 do cancel. In addition, there 



1920 *24 *28 *32 *36 *40 *44 '48 1952 

Char* 13,6. Dome»tic Consumption of Rayon Filunu*iil Yarn, 1920 1952, 
and Trend as Shou’ii by a Second-Degree Curve Fitted to Logarilhins of 
the Data. Data (»f Tal/le 13.3. 

is some merit in the fact that the use of logarithm.^ equalizes the impor- 
tance of fluctuations in regard to their rrlativc, rather than in regard to 
their absolute, deviations from the trend. This is particularly pertinent 
when there are small cyclical variations about the lower portion of the 
trend and larger (that is, larger absoluteljO cyclical variations about the 
upper part of the trend. In such a situation, the trend line is more likely 
to pass through all of the cycles rather than through only the larger ones. 
This point may more than olfset the technical disadvantage of fitting 
to the logarithms. 
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Second-degree curve fitted to logarithms. Sometimes data are 
encountered which, when plotted on semi-logarithmie paper, continue to 
show curvature, being con<*,ave either upwaid or downward. Chart 13. G 
and Table 13.3 vshow such a series, the domestic consumption of rayon 
filament yarn for 1920 If >52, which is concave downward, iridujating that 
the ratio of increase has been decreasing. We may fit a second-degree 
curve to the logarithms of the V value.s, u.sing 

log Yr log a + A" log b + A" log r. 

'Faking the A’ origin at the middle of the period, the three normal equa- 
tions are 

I. w log >’ -= A' log a T log c 2A"\ 

II >: Alog y - log 6 SA^. 

III. 2 A'"log Y' — log a 4 log c 

Fr.>ui .^op*‘ndix B we ascertain that 2A- -- 2(1,496) = 2,992 and 
2A'‘ 2(234,8 IS; tr)9,()9(>. All of the otlier values may be had from 

Table 13.3, and we solve the normal eciuations as follow.s: 

11 2 Adog y log h 2A^ 

160 1 10215 2,992 log b, 

log h --- 0.0535228. 

I. 2 log A' log a 4 h>g c 2A'^ 

III. 2 A -log r - log a 2A*^ 4- log c 2A^ 

I, 7(>.2tl9235 - 33 h^g a --f- 2,992 i -g c. 

HI. 6.582.178801 - 2,992 log a 4 469,69(i log c. 

(1 X 90.606667). 6,915.077332 - 2,992 log a 4 271.274.67 log c, 

III. 6,582.178801 = 2.992 log a -f l(>9,69t) log ^ 
3.^”89853r="^ 198,421.33 log c. 

log r - -0.0016777'. 

I. 76.269235 = 33 log a i~ (2, 992)( -0.00167774;. 

33 log a = 81.289033. 
log a = 2.4G330‘^ 

Check, using III. 6,582.178801 = (2,992) (2.463304) 

4 (469,696) (-0.00167774), 

= 6,582.178801. 

Trend equation: Log T. - 2.163304 4 0.0535228A - 0.00167774A2. 
Origin, 1930. X units, 1 year. 



296 


ANALYSIS OF TIME SERIES 


(Chap. 13 


voairHdo6cor^d'j^»-<OC^cC!?oco»ooi'-<£>ccii>»QQac5QO’fl^r>-5P^P«;jMO 
^ ^ ^ ^ ^ ^ ^ ^ ^ (y5 O t'- I>- t-- I- !>• 



lOOOOCCicOt^(MM*^iOi?3CO<35'X?t^CO’^CSCr!C^'-'n'M*r— <<NOODOr2<NOTf<r^ 
.•» i?ot>'-^r-^fO^OO^coco^^?o-j'OOTM:ot^kr5r'-'n'iC)r-tr-<iO'^ciOOW»oco 
«, ^ 1 ri^ 05 ‘O ro i'^ t" I- rc 'C iO *-« ro r-. o I'- CD o I p-* -H I-- p o lO r- CO 

S 'r^ «N IC -< O »0 I'- ^ — . CO <N I- C5 CO CO IC5 00 O 00 rf iO cr- O CC O 00 00 Q 

3 5P t'^30aO'JOl'^I^:0'<t'<NQOCKO(N»'«1’0«£?— »;CC:>uOXiMvOcCOcO'^Ot>.X»0 
^ 9 .-<C^CO^»^Ol^XC500-^C4McO’r'riOi^OO'd^I:^f'.XXXXX«OOX 

> ^ !-^,-.--^d -J ,-4 -H (N<N!Nl(NCV»d<N<N<N(NCSJ(NC'l<Nd(NC<IC^C^(N(NCvicsd 


'r*'C'^:pO'rJ<OT«O'0’^O’t'CDc£)-r •roy?'^0’rOOo“0*T'C!CO'1’QT 

1 lO O O O UO O cn CO !N O nO X O Oi t- r-. cr.. CO X lO C* 04 CO CTi C: lO lO O O lO -I 
!-^-^i>»XTHcO’>^touOCiOCcocoOJOr-' t'.OJ!COroxo;L.O'0-f'0'0'Xr>.T«- 
,0 ri CO fo 05 O I-- CM'- w 53 o- -r r- I-'. 5; 'O* -r o; O t- 05 r- O 5- .CO CO C5 C 
1»C "'f X lO O O I- X CO 04 CO 55 X C i'- O CO I- i X C5 CO (N c: X r- O »0 lO X 'O' »? 
1 53 t'-x CJ —■ CO I'- ».0 t- 04 O CO ‘-0 0> — '0> lO O O 04 t- lO I - CO *-• CO X C 

l!Nt^C4a6^QcOCOOXcO'fr'OJ-«COC- OQr-'CN-t"sOXOc:''0 0'rXWt'-0 
iTfCOX<NOl55-^ — —"OIOOCCOOO OC'OOOC'C'-i-'-~'C4(N04COX'»1 


joooooooooooc :oO'OOcoo'; 
I i i ) i I I i ) i M I i i i j I 


O 'T X (M t'- 04 
C4 04 04 CO X 

53 C d O CO o 

I i I I I I 


, 04 o: X O 04 O X -O -t C4 Q X CO ''C- 04 CO X -JO 04 C 
55 ^4 r-- i: X CO X -< m- O 04 uo X ’'1' CO 5^ (N O X 
!xcOXOXuOr-53C4»fC0 55-^X^XO<N-Ct-r3 — 
’ 55 -ti .05 »0 O >0 O '-5 — CO to C'H'- (Cl t'- CO X CO X ro 53 
lcOOXr>.p-i'«J'X— 'kOX04v0 55C4c05iC0cOOc0t-^ 
iC5cO.-tOC4t'-(NXCOX’'t'550'0^00>— 't--C‘4t-cO 


■o 04 C x: CO ’C 04 CO X. CO -t' 04 'w X I 

O' (N O X' O CO CO 5- 04 O' I - ^ X CO X ! 

-C t - 55 — -^r O X O CO lO t-- O 04 *1* CO ; 

X X CO 55 -c 53 C »0 o o O 'O 

O O 1- ^0 ^ »-,5 X 54 lO C3 C4 O 53 

t- 54 CO X X 53 -f 53 lO C O — O »- 


.ScSl>*L-XX53&dd’-^^04XCO''f''l'*OiO’bdi-l-XX53do!^'-‘(NC‘4x] 
; ,J ^ r-i pH p-i .-I »-i d d d M C4 <N 04 N 04 C4 !M <C4 54 <M !N <04 C4 ffNJ (M X ;0 X X X X I 


<04xx''a''»i'*.oiococci-i-xx: 


to (N r 
CO 04 PH ^ c 
I X yo fO 04 C 
|C0I03XX- 
(P^ "t CO S ' 
55 tO'; 


' '-c lO -'-"T CO lO 'r« CO 53 O X X I - 'O ‘5 C 'O O lO O >5 " 

. o <N o X »o o I-- X -c n- 1 - 05 n* PH X o 04 '04 lO r-- 04 : 

5 to 04 tc X CO CO CO CO’ lO X O »0 04 X cO 55 O ^h cc r 

. 'Tt' X or. X X 04 o -H X X » 0 ) pH .c 5 -H to co i 

4 h- pH o X t.O O 04 to C X X to O 04 I . X -H 53 H- M 1 -. I 

) ‘O 53 <55 55 PH -n* H. I-, 5 PC X p- O X — CO X to C pH X C 

> d f '•C O 55 (N O d 53 to PH PO 'O 04 lO 0-4 CO — 53 rr c 0 

> t-p. uO X 04 64 X CO 55 X I - 04 X O' 5 1 X X CO ' 

H pH r-S 04 04 X "'C tO CO iT 


ko to ^ 53 'Pf — ' 55 PH '* C5 CO to CO 55 '<4' p 
to X 5. CO 04 O X CO r' X 04 PH 
04 04 P-i -H — pH pH 


1 -Cf 55 CO to 'O 55 Pt '-< O pH Pj» -33 CO »0 fO 04 
pH04X'Pl"C0XOC'4p<tP0 53 04i053 
-^04 04,53 


MO X 53 X O O 04 04 pf «0 CO X X tOi X X O to X 04 lO o 'w O to -r to 04 
i 1 ^ lO t-- -r to X O 4 - X X ■'^ I" I" t- O t’-pHXcOpH 04 04tOOQCC'COXXitO'— 

T 5- C- '!t' t - pH H. Pf o CO X I 'CO C0X'--^-c0 53XX5jO53r'-t-04 04r'. 

4 C3 r>p ■p*' X 'f ''T <5 X 04 53 X f'- r>- 53 04 CO lO X X I ' to 1- X X X PH 04 CO X X C3 

■PC 55 .o 53 -H 04 c o X 04 X 04 I - I - o 04 CO 'O I - 04 »0 O x^ O'- X ph 04 tO 04 

'TrJ'cOtO'OXOOX*^JtC3 1--5. tO-r ->’XX:0 040XX004 'T'hOI-OX 


04-HC55Xt-cOtCX*XOlH<Op^04c 

TTT 1 ! I f I { I I I 


^ 04X X* to CO o 


53 »0 I X 04 X X O X CO -O 53 X* 'O 55 X X X 55 pH to X X X 55 i.O O CO O tO HP I r-p. lO 
1 — CO 5- X pH 04 O X 04 -H 04 -'t Ol X O X I - O 'O pH 04 X O CO X O O X 53 X pH lO X 

;i5 -O -O X X 53 X' O X O' tc I- jC 5. to *0 CO 'O 5. O CO to 5 j 5. CO X X 53 tp- to 04 04 X 04 

1 53 CO 04 -H lO X 04 O C X p- CO — 15 53 04 X X t - 'O 55 lO C X pH 5- X 04 X O I - CO 55 

'X 53 53 Hi 0-4 cO X O O Ht r H 5. 'X 04 X O I'- 04 X ‘O X »0 I - X. X 1-- 04 CO 04 -53 X X 04 CO 

‘55 04 X to 'C Ih t.. 5 3 PH O pH -H X 04 X X X X to lO CO CO- '03 t-H iH X X Oi X C5 55 05 04 


jt-H X t- tO 04 04 'O O PH »C 53 X O X X tH o .-i X t'H X X 04 pH X t<0 X b- I'- to X O 

D.'-. 'X55 X 04 04 XOOOH-'f-»'-04p-XOirp.r--X53 X 04XX53dc053<DOIXtOX 
H pH04XXiOCOC}OXpHiOXp-55«0 53COf-iOX»Oc0 5jXOcOOiXXtOcOX 

H , Hi-HPH-H^P^04pH04 64O4MXXXXXtOCOcOt'-Xt-HO5XX 


JO PH 04 X X CO r- X 53 O PH « t X X to 'O t- X 55 O -H 04 X X to CO I- X 55 
040404 0104040404 04 04X:CXXXXXXXXXXXXXXXXXX 
55553555555555 . 53 53 55 5. C3 5t530535503 5..0355555>5505 5i055505 

|pHHHpHp«i«— (rHfHpHiHp-i fHH.<-H»H»HlH*HpHHHpHH-tHP|iHpHHHpH,-i^*H*H 


lOtoS? S 

05 55 53 .0 

hHpHhH ^ 



Chap. 13] 


NON-LINEAR TRENDS 


297 


The procedure for computing the trend values is indicated in Table 
13.3. The trend is shown graphically in Chart 13.6. Two comments 
are in order concerning this trend: first, it is not a particularly good 
description of the series; second, the trend, if extended, would begin to 
drop off in 1953! Two trerid.s, one fitted to data for 1920-1937 and the 
other to 1938-1952 data might be a better description. A Gompertz 
curve (see Charts 13,10 and 13.11) is a much better description and does 
not turn dov/n. 


ASYMPTOTIC GROWTH CURVES 

The straight lino - a + bX, whh‘h was discussed in the preceding 
chapter, describes a constant amount of increase or decrease. The 
exponetitial curve, Yc involves a constant ratio of change and, 

therefore, a constant ratio of change in the amount of change. If 5 is a 
positive number greater than one, the trend is upward and the amount of 
change ia undergoing a constant percentage of increase; if 6 is a positive 
number smaller than one, the trend is downward and the amount of 
change shows a constant percentage of decrease. 

Over long periods of time, chronological series are not likely to show 
either a constant amount of change or a constant ratio of change. It is 
much more likely that an increasing^ '"cries will show an increasing amount 
of change but a decreasing ratio of change. This is true of the data of 
("harts 13 10 and 13.11, which show domestic consumption of rayon 
filament yarn. 

It is also possible that an increasing series may show a decline in the 
amount of increase. Decreasing absolute gro^^th is not often encoun- 
tered, but we shall discuss one curve of this type the modified exponen- 
tial, since it serves as an excellent introduction to the more important 
Gompertz and logistic curves. Before begimung a consideration of the 
modified exponential curve, passing mention may be made of three other 
curve types which may describe a decreasing amount of growth. These 
are: 

(1) Modified polynomials, such as 1^ — a + bX^j Yc — a + bX^ 4- cX, 
and others. When three or more constants are present, one (or more) 
constants may be negative, in which case the curve may ultimately turn 
down. 

(2) Straight line to log X. The expression is Yc — a + h log X. 

* Series which are declining may show a decreasing amount of change. The de- 
creasing amount of change may represent a decreasing or constant (but usually 
decreasing) ratio of change. To avoid possible confusion, most of the discussion 
concerning asymptotic growth curves will deal with increasing series. 
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This curve type should not be used unless there is a logical justification 
for considering the logarithms of time. 

(3) A parabolic curve to log F, Avhich is written log F* = aX^, may 
be fitted by least squares by writing it log log Yc = log a + b log -Y. 

Note that, when using the logarithm of A', the X origin cannot be taken 
at the middle of the period. 

The niodifiecl exponentiah This curve not only describes a trend in 
which the amount of growth declines by a constant percentage, but the 
curve also approaches an upper limit, called the asymptote. This is an 

TABLE 13.4 

Hypothetical Data for Modijieil Exponential Curve 




(A.«*yniptote k » 

114 ) 


X 

y 

Partial | 
totals 

r i 

incrPHK'nt i 

P<'r c(Mit of 
proording 
increment 

(1) 

! (2> 

L„ 1 

(V ! 

(5) 

U 

5U 


i 


1 1 

1 

60 

1 lUi 0000 1 

1 ) 

1 


2 ! 

! 78 

!“ 1 

! 1 

12 ! 

76 

3 1 

i 

! 105 0000 • 

9 1 

75 

^ i 

915 75 


0 75 

76 

5 

98 S125 

192 5025 i 

5 0025 , 

75 


important property of grow th ^curves, sijicf* many time series seem to 
approach an upper limit . The equation of tlio modified exiioneniial is 

Yr - /: -f- 

where k is the asymptote. 

As noted in footnote 5, we shall give onr attention prineirily to increas- 
ing series, but Chart 13.7 shows four shapes whieh this equation may 
assume. It must be clear that our interest centers on part. 1 of Chart 
13.7, since that is the only one of the four which niprc'-ienls an increasing 
series with an upper asymptote. There are occasions when one might 
wish to use a trend like that in part 3 of Chart 13.7. This would be 
true for a declining series tending to have a constant percentage of 
decrease in the amount of dccrea.se. Death rates from a specific disease 
may behave in this fashion. 

The reader may find it illuminating to substitute various values for k, 
a, and 6 in the equation for the mo<lificd exponential and to draw for 
himself curves like those shown in Chart 13.7. This will provide him 
with specific illustrations of the situations stated generally in that chart. 
Note that negative values of h are of no interest to us. 
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The first two columns of Table 13.4 show a series which has a constant 
percentage decrease in its amount of growth. As can be seen in Columns 
4 and 5, each first difference is 75 per cent of the preceding first difference. 
The increments of increase are Ai, A 2 , A 4 , and A 5 , and 



Referring to Chart 13.8, the horizr^nlal broken lino near the top of the 
chart is the value A that the curve of this series approaches; in this case k 



J) a is «j;n( afi l h ih I^'sh fhan (2) When u is nr>{];:ttivc and b is greater 

than one. 



(S' When '■j p, (H)sUm and /> is less than < 1) When a is posilive and b is greater 

one. tlian one. 

Chnrl 13 . T, Four l\»rrtis of the Modified F.xponeiitial Ciir>e, — k -h 


is 114. This luet’jis that, if we should extend the trend line indefinitely, 
it would approach closer and closer to this value, but never quite equal it. 
The se(’ond constant, a, tht^ \ alue obtained by subtracting the asymptote 
k from the trend value when A" is zero, in this instance is —64. The 
third (‘oiistant, b, is, of cotuse, tliv ratio between successive increments of 
growth, or 0.75 for this series. In C'hart 13.8 the vertical broken line 
when A = 1 is -01(0.75} = —48; when A = 2, it is —64(0.75)" - —36; 
and so on for the other values of A". Thus these vertical broken lines are 
descril;ed by ihe (‘.xprcs.sion ah^\ This is true when A" = 0 also, since 
— 64(0.75)^ ~ —64. In the diagram, is represented by the height of 
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Chart 13.8. A Modified Exponrntiai Equation Fitted to the DatA 

of Table 13, 1. 

the shaded area. If now, in turn, we subtract from k the value of each 
of the vertical broken lines, we have the trend values. The vertical 
broken lines are subtracted from k because the sign of a is negative* 
Thus: 

X k^ab^ « Fc 

0 114—64 =50 

1 114 - 48 =66 

2 114 - 36 =78 

3 114 - 27 =87 

4 114 - 20 25 = 93.75 

6 114-15 1876 = 98 8125 

Since the sign of a is negative, the increments of growth are declining. 
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As is already obvious, for this series of data the equation is Fo = 114 — 
64(0.75)^. 

This curve has three constants: k, the asymptote; a, the distance 
between the value of Yc when X — 0 and the asymptote; and 6, the ratio 
between successive first differences. Three equations are therefore 
required for fitting it. These are obtained by first dividing the data into 
three equal sections, as in Table 13.4. Then the Y values are totaled 
for each section, as in Column 3. The results are: 

For the first third SjF ~ 116. 

For the second third 22 V' -= 165. 

For the third third ZsF = 192.5625. 

Let us note what 1 16 represents in terms of c. .r equation. It is the sura 

of 50 + 06. But 50 is k + and 66 is /c +• so 

116 ~ 2/j d cib. 

This is Equation I. The other two are obtained in similar fashion. The 
three equations are: 

I. 116 — 2k + a -f ab, 

IT. 165 ' 2k + ab^ + ab\ 

III. 192.5625 = 2k + ab^ + ab\ 

In order to solv'e for 6, we first subtract Equation I from Equation II, 
obtaining Equation A; and then subtract Equation II from Equation III, 
obtaining Equation B. Thus: 

A. 49 = ab^ + ab^ — ai - a 

- a(b^ + b^ -b - 1 ). 

B. 27 5625 ob^ + ab* — ai>® — a6- 

- ab \ b '' + 52-6 - 1 ), 

The constant b is now obtained by dividing Equation B by Equation A. 
We shall call the resulting equation C. 

+ b^ - b - 1) 

Q _ , _ 5 „ 1 ) ’ 

= 0.5G25. 
b = 0.75. 

The value of o may now be gotten by substituting in Equation A or B. 
A. 49 = a(0.75» + 0.75* - 0.75 - 1). 

49 

“ " -0.765626 “ 
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The remaining constant k may be computed by substituting the values 
of a and b in any of the original equations. 

1. 116 - 2.^: - 64 - 64(0.7:)). 

2k - 228. 
k - 114. 

The values of the constants are thus found to be those which we knew to 
be correct. The equation was not obtained by the method of least 
squares, but was so fitted that the three partial totals of the trend viilue.s 
were the same as those of the original data. In this case, suice the original 
data conform to the equation type perfectly, the fitted curve passes 
through ail of the original points. 

The logical procedure, which has been c\{)lained. ( an be developc'd into 
more convenient formulas, which are as follows:^ 


- 


a - 


I 

- 2:,F) 


h - 1 
(//• - F 



V V 
r 



a 


where n is the number of years in each third of the data. Solving by 
these formulas requires, (jf ccjurse, that h l>e ()])1ained first, tlieri a, and 
finally k. 

If the expressions for a and h are substitutc'd in ihit expn'ssion just 
given for A*, we obtain 

' {Z,Y)a,Y) - (i,Fj^ 

L i 1 1 -j- 3 — 2 i 2 I 

which enables us to obtain the asymptote without firsl computing a and b. 

Since time series do not oftcm behave iii such a manner that a modificnJ 
exponential is a logical fit or a good desc'ription of the series, noillustration 
is given of the fit of Ye - k + ab^ to a set of actual data. As noted 
earlier, the treatment of the modified exponeutial curve is intcunh'd as an 
introduction to the two oilier growth (airv(*s to bc^ discussed in the follow- 
ing pages. 

The Gompertz curve. In the form which is of primary conc'ern to 
us, the Gompertz curve de.scribes a trend in which the growth increments 



• The derivation of these fominlas i.s given in Appendix S, section 13.1. 



Chap. 13] 


NON-LINEAR TRENDS 


303 


of the logarithms are declining by a constant percentage. Thus, the 
natural values of the trend would show a declining ratio of increase, but 
the ratio do(‘K not decrease by either a constant amo\mt or a constant per- 




( 1) \Vh('n ios i.s ii< native aixl h i.s kv 
t}\an 



(2) \\ hni k (1 is TH‘ji;.itive anti b is 
Croat t r than one. 



Whon In}^ n 18 po.sito'o aiai b ;s loss 
than one. 


Whop, lop; a is positive and b is 
pjroaler tlian one 


(^harl 13,9, Four Ft)rnis of the (xompiiiz (’urve, ' = The vortical 

\:i]iu’s at tiio points inarkc'd (*} are antilop Uo^^ ^ -{- log a). 


cj'ntage. 1'he eipiaiioa for the (ioinpertz curve is 




which may be put. in logarithmic form 

l<»g Yc ^ log k -f (Jog a) 

The four parts of (Tiart l.'i 9 sliow' four slu'ipes which the Gompertz 
e(piation may assume. While the statistician might occasionally find 
use for the Gomi)ertz curve to descvd»e trends of the types showii'^ in 
imrt s 2 and 3 of ( 'hart 13.9, our major interest centers in the form shown 
in part 1 of the chart. 'This curve (and also the cairve in part 2) has an 

' Deathrf of railway employees, arcidents in factories, specitic death rates, and other 
declining series might be described by a Clompertz curve having a lower asymptote at 
the right. Whether th(Ti‘ is or is not an upper asymptote will depend upon the 
behavior of the thita to which the curve is titted. 
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upper and a lower asymptote, the lower asymptote being zero. Only 
positive values of b are considered in Chart 13.9, since negative values of 
b do not yield useful curves. 

Whatever has been said about the behavior of the modified exponential 
curve applies also to the logarithmic form of the Gompertz curve. The 
Gompertz curves shown in Chart 13.9 would, if put in logarithmic form 
(or plotted on semi-logarithmic paper), look like the corresponding parts 
of Chart 13.7. The fitting of the Gompertz curve is to the logarithms of 
the observed data and may be accomplished in a manner* exactly paral- 
leling the fit of the modified exponential. The expressions are 


6 ^* 
log a 
log k 


2, log Y ::^^2jog Y 
2. 2 ' log r“~ 2i log Y 

(2. log y - 2, log }') 


2 


1 

n 


2i log Y - 



log a 


If it is desired to obtain the value of k without first computing log a and 
6, use 

(Si log F )(2 3 log Y) - ( 22_log_)0^ 

. Si log Y + Sa log Y — 2 S 2 log V 

Using this expression first enables one quickly to ascertain if the upward 
trend has an uppei* asymptote; computing k in this manner also pro- 
vides a check of the value of the k obtained by the formula first given. 
Whether or not there is an upper asymptote for an increasing aeries may 
also be ascertained by noting if (S3 log Y ~ S 2 log F) is greater than or 
less than (S 2 log F — Si log F), If the first difference exceeds the 
second difference, 6” (and, therefore, b) is greater than one, and there is 
no upper asymptote for the incrcavsing series; the curve of such an increas- 
ing series would resemble that shown in part 4 of Chart 13.9. If the first 
difference is less than the second, b is less than one, and the curve of an 
increasing series would look like part 1 of Chart 13.9. 

The data of Table 13.5, which are shown also in Charts 13.10 and 13.1 1, 
will serve as the basi.s for an illustration of the fit of the Gompertz curve. 
The computation of the required sums of the logarithms is carried out in 
the fourth column of Table 13.5. Using the expressions previously 


log k 


* A number of Gompertz curves, fitted by a method different from that described m 
this text, may be seen in Growth Patterns in Industry, National Industrial Conference 
Board, New York, 1952. 




Chari 13.10. Domestic Consiifnption of Hayon Filanienl Yarn, 1920-19-^2, and Trend as Sho>«'n hy a Gontiperlx Curve, 

Note that this chart has an anthnictic vertical scale. The Gompertz curve h»sbeen extended to show the general shape of the curve. 
Data from Table 13.5- 








Chart 13.11. l>om«»Uc Consumption of Kayon Filament Yarn, 1912-1952, and Trend as Shown by a Gompertz Curve 
Fitted to Data for 1920-1952, \oie that this chart has a lugarithmic vertical scale. The Gompertz cun^e has been extended to show 
the general shape of the curve. Data for 1020-1952 from Table 13.5; data for 1912-1919 from the source given below that table. 
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TABLE 1.1.5 

Computation of Values for Fit of Gomperts Curve to Domestic Consump- 
tion of Rayon Filament lom, 1920-1952 


(Million.^ of pounds) 




Con- 


Computation of troiid values 

Year 

X 

sump- 

tioxi 

Y 

, IjOg Y 


(Log a) 6-^ 

Log Y, 

= log fc + 
(log o) 6^ 

r. 

1920 

0 

8 7 

0 939519 

1 0000000 

-2 215732 ! 

1 222791 

16.7 

1021 

1 

19 8 

1 2966fi5 

0 9523195 

-2 110085 1 

1 328438 

21 3 

1922 

2 

24 7 

1 392C97 

0 9069124 

-2 009175 

1 429048 

26.9 

1923 

3 

32 5 

1 511883 

0 8636704 

- 1 913662 

1 524861 

33 5 

1924 

4 

42 2 

1 625312 

0,8224902 

-1 822118 

1.616105 

41.3 

1925 

5 

58 2 

1.7C4923 

0 78:32735 

-1 7.35524 

1 702999 

50.5 

1926 

6 

60 6 

1 782473 

0.7459266 

-1.652774 

1 785749 

61.1 

1927 

7 

100.0 

2.000000 

0 7103604 

-1 5739‘‘'9 

1.864554 

73.2 

1928 

8 

100 1 

2 000434 

0 6764901 

-1 498921 

1.939602 

87 0 

1929 

9 

131 5 

2 11892(1 

0 6442347 

-1 4274.52 

2 011071 

102 6 

1930 

.'0 

117 9 

2 071514 

0 6135173 

-1.359390 

2 079133 

120.0 

2ilog 1 



18 5U4346 



18.504.35iy 


1931 

11 

157 3 

2 19(1729 

0 5842645 

-1 29)574 

2 M3949 

139 3 

1932 

12 

152 0 

2 181844 

0 5564065 

-1 232848 

2 205675 

160 6 

1933 

13 

211 8 

2 325926 

0 52987(38 

-1 174065 

2 264458 

183 8 

1934 

14 

191 8 

2.289589 

0 504(1120 

-1 11 8085 

2 320438 

209 1 

1935 

15 

252 7 

2 402605 

0 4805518 

-1 064774 

2 373749 

236 5 

1936 

16 

297 G 

2 473633 

0 457(5388 

-1 014005 

2 424518 

265 8 

1937 

17 

267 1 

2,426674 

0 4358184 

-0 96.5657 

2 472866 

297 1 

1938 

18 

274 1 

2 437909 

0 4150384 

-0 919614 

2 518909 

330 3 

1939 

19 

359 8 

2 556061 

0 .3952192 

-0 875766 

2 562757 

365 4 

1940 

20 

388 7 

2 589615 

0 37(14035 

-0 834009 

2 604514 i 

402.3 

1941 

21 

452 4 

2 655523 

0 3584564 

-0 794243 

2 644280 1 

440 8 

So log Y 


. j 26 536108 



26 536 n3v/ 


1912' 

22 

4()S 8 

! 2 670988 

0.3413650 

-0 750373 

2.682150 

481 0 

1913 

23 

494 2 

2 {39:1903 

0 3250885 

-0 720309 

2 718214 

i 522 7 

1944 

21 

539 1 

2 7310(39 

0 3095881 

-0 685%4 j 

2 752559 

565.7 

1945 

25 

602 4 

2 779885 

0.2948268 

-0 653257 

2 785266 

609 9 

1946 

26 

(366 5 

2 823800 

0 2807693 

-0 622110 

2 816413 

655,3 

1947 

27 

729 3 

2 802906 

0 2673821 

-0 592447 

2 846076 

701 6 

1948 

28 

84(; 7 

2 927730 

0 2546332 

-0 564199 

2 874321 

718,7 

1949 

29 

782 7 

2 893595 

0 2124922 

-0 537298 

2 901225 

796 6 

1950 

30 

955 5 

2 980231 

0 2309301 

-0 511679 

2 926814 

1 845 0 

1951 

31 

865 4 

2 937217 

0 2199192 

-0 487282 

2.951241 

893.8 

1952 

32 

845 0 

2 921)857 

0 2094333 

-0 464018 

2 974475 

942.9 

zaog )■: 


31 228781 


1 

";^1 2287S7v/ 



Data from Textile Econoiuie'^ Bureau. Ine., Terf^'^ Orytinon, Vol. XXIV, »c. 2, February 1953, 
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given, we obtain 


i)» 


6‘' = 


Sa log Y - Si l og r 
Xi log,}' — 2i log }' 
31.228781 - 2().53fil08 


4.G92G73 


26.530108 - 18.504346 8.0317G2 

Log 1)" — 9.7GGG0946 — 10 = 109.76GG094G - 
Log b - 9.97S7826S - 10. 
b = 0,95231950. 

Leg a = (X, log y-X^ log Y) 


= 0.58426445. 
110 . 


= 8.03 1762 7- 


0.04768050 


-0.047G8050 

8,031762 » 

0 17283605 


Log k 


(-0.11.573555)* 

= (8.031762)(-0.27587127) -2 2157324, 

1 


I’l log >' 


11 

1 

I’ T 

- 3.438523 


\b -- \ .' 


log (I 


/-0.41.573555\ 


Check, using 


Log k = 


1 

_1 

n 


log }’ 4- X 3 log }' -- 2 w 2 log }■ 

(18 50 1.346)1,31. 22_878n - (26..')36108)- 
18 .504346 -+■ 31.228781 - 2(26.536108) 


= 3. 438522. 


Trend equation: 

Log }'e = 3,4.38.522 - 2.21.57324(0.9.523195)-*. 

1. =•- 2, 744. 9(0.00608.509) 

1920. X units, 1 year. 

The natural form of the trend equation is oblained by looking up the 
anti-logarithms of log /c and log a. Since log a ^ —2.2157324 is a nega- 
tive logarithm, it must be rewritten Jog a — 7.7842676 — 10 before the 
value of a — 0,00608509 can b<^ obtained from Appendix R. Note that 
b = 0.9523195, which indicates that the ratio of increase each year is 
declining: more specifically, that each difference between successive 
logarithmic trend values is about 0 95 times (or 95 per cent of) the pre- 



Chap, 13] 


NON-LINEAH TJIENDS 


309 


ceding difference. Whenever i < 1 , the value of 6 - I is negative, 
resulting in a negative value for log a, if ^^2 log Y exceeds Si log F. 
(See the equation for log a.) If log a is negative, a is less than one. 

For our data, when X is zero (the value of X for 1920), = 1.0 and 

- 0.00608509, with the result that for 1020 Y, - (2,7^5) (0.00008509) 
== 16.7, the value shown for 1920 in the last roiumn of Table 13.5, The 
greater the value of X, the smaller the value of As A' increases, 
approaches zero and approaclie.s 1 . 0 , with the result that Y, approaches 
it, or 2,745, the upper a.symptote. 

The procedure for computing the treral values is shown in "J'able 13.5. 
Note that Si log Fc ™ Sj log F, So log F^. S^ log F, and S 3 log F^, 
Sg log F to at least six digits. These agreements'^ ure noted by check 
marks in the column headed *'Log Fc.^^ The trend v^ahics have been 
plotted on Charts 13.10 and 13.11 and havm beoi' extended in both direc- 
tions to indicate more clearly the shape of the fitted curve. The exten- 
sion of *hr> trend to 1996 is not intended as a forecast, although the 
(5ompertz curve is sometimes u.^^ed to assist in making predictions. The 
asymptote is shown on both of the charts, and the approach of the trend 
to the asymptote is apparent. 

Tn Chart 13.10 it will be noticed that the amount of growth is small at 
first, then be(.(>inos largoi until it reaches a point of inflection, after wliich 
it declines and finally approaches, but never reaches, zero. This general 
shape of the trend is common to many industries and has led Prescott*"' 
to the corielusion that it describes a law of growth. Aectording to 
Prescott, this trend is a function of population growth, the curve of 
which typically is similar iu appear but if i ilso p.artly due to the 
development of the individual industry. He belu :\s that tlie growth of 
an industr}'^ may be divided into four stages: 

(1) Period of experiment ati on, 

(2) Period of grotvth into the social fabric’, 

(3) Through the point wdiere growth increases but. at a diminishing rate, 

(4) Period of stability. 

These stages are not very specifically demarcated by Prescott, who also 
claims for this type of curve that it is useful iu forecasting the future of an 

* The values of log h and of b were obtained from a more extcn.sive table of loga- 
rithms than the one given in Appendix R in order that these e(iuahti»\s might be close. 
Use of Appendix R, together with arithmetic interpolation, for log h and for b yields 
the same values as in Table 13 5, but the agreement of the partial sums of the 
logarithms is not so exact. 

“Law of Growth in Forecasting Demand,” by Raymond D. IVescott. Journal 
of the American Sta^^'siical Associaiton, Voi. XVIIl. December 1922, pp. 471-479. 
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industry, since it not only is a logical curve but, on account of its tendency 
to flatten out, tends to be conservative in its forecasts. The horizontal 
dashed lines of Charts 13.10 and 13.11 would seem to indicate that the 
upper limit of raj’^on filament yarn consumption in the United States 
would be about 2,745 million pounds. While this does not appear, from 
the charts, to be an unreasonable figure, it may be too low if additional 
uses for rayon are found or it may be too high if other synthetic fibers 
supplant rayon. 

The logistic curve. This curve, which is also known as the Poarl- 
Reed curve, is, in its simplest form, 

~ =■* k + ah^. 

t c 

From this expression it should be clear that it is merely a modified expo- 
nential in terms of the reciprocals of the Y values; the first differences 
of the reciprocals of the values are declining by a constant percentage. 
A modified exponential could therefore be fitted, by the method of partial 
totals, to the reciprocals of the observed F values, and the reciprocals of 
the fitted values so obtained taken as the trend values. However, this 
curve is more often written^* 

and, although the procedure Is more subjective, fitted by the method of 
selected points. In this form, the logistic curve will always have an 
upper asymptote of k and a lower asymptote of zero; it looks like part 1 

or part 2 of Chart 13.9, In the form — k + the logistic could 

r f 

assume forms similar to all four of those .shown in Chart 13.9. 

To fit the equation 

y == 

' I + 

by the method of -selected points requires choosing three years, xo, Xi, and 
U>sually e “ 2.71828 li? used, instead of 10, in the denominator, giving 

y . 

1 4 - 

The a values and the b vahn-s in Iht? tw<; for!n.s will differ, but both forms describe 
the same curve, and the Y r. values are slightly easier to compute from the expression 
using 10 in the denominator. 
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X 2 , equidistant from each other: one near the beginning of the period, one 
in the middle, and one near the end. The three selected values through 
which the fitted curve will pass are the V values associated with these 
three years. These V values are designated !/o, yi, and z/ 2 . The origin 
on the X axis is at the year designated xo, and n is the number of years 
from Xo to Xi or from xi to Xa. The three constants are obtained as 

follows: 


k = 


2?yo2/iy* - pl(?yo + j/a) 


a = log 

6 = -'- 
n 


yoy2 
k - yo 


yl 


log 


!/o 

yi/yk - yO 


y^{k - yo)J 


As nil oiu.’lration, Table 13.6 shows the procedure for fitting a logistic 
curve to data of the population of Continental United States for 1810- 
1950. The population data are showm graphically in Chart 13.12. This 
period, including 15 decennial figures, was used instead of the entire 
period 1790 -1950 in order that comparison could be made with the 
method of partial sums of reciprocals, mentioned previously.** In Table 
13.6, the three selected points are 


yo, the geometric mean of the values for 1810, 1820, and 1830; 

y\, the geometric mean of the values for 1870 1880, and 1890; and 

?/ 2 , the geometric mean of the values for 1930, : MO, and 1950, 

Consequently, Xo is at 1820, xi at 1880, and x^ at 1940, as shown in the 

second column of Table 13.6. A\’erages of three decennial figures were 
used in order to minimize the effect of a single unusually high or low 
value: the geometric mean was used in preference to the arithmetic mean, 
since the population growth is more nearly a geometric progression than 
an arithmetic progression. The value of n is 6, the number of years fi )m 


** For the mathematical reasoning belnnd this type of curve, see Raymond Pearl, 
Studies in Human Biology^ Williams and Wilkins Company, Bal- imore, 1924, Chapter 
XXIV. 

For 1810-1950 the method of partial sums yields fc = 185.9 millions. The fit 
in Table 13.6 shows k =* 190.3 for the method of selected points for 1810-1950. The 
method of selected points for 1790-1950 (using the geometric moans of the first three, 
middle three, and last three years as those points) gives k = 189.9 milhoiis. Several 
other methods of fitting a logistic curve are given in K. K. Xair, “The Fitting of 
Growth Curves,” in Oscar Kempthonie, et al., editors, Statistics atid Mathematics in 
Biology f The Iowa fate College Press, Ames, Iowa, 1954, pp. 110-132. 
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Chart 13.12. Population of <>onlinerital Cnileti State#, 1810 1950, and 
Trend as Shovn by a Logistic Curve'. The rurve has h'ern evteinlod to 

show the general .shape of (he curvt*. Data of Table JihO. 


TABLE 13.6 

Ckfmpulation of Valups for Fit of Logistic ilurcc fo Data of Population oj 
Continental Cnited States^ 1810-1950 


, 




P'>pij' 



CTomputation of trend values 

fear \ 

1 


Ji 

I'ltu'n sn 



Lug p. 

1 

i 


{iiOltons 


0 1381596 Y ’ 

- 1.274070' 

g ' 

1 4 p 

1 



y 



0 1381»96X 



a) 1 

V2\ 

(T 

(4; 


(61 

qj 

(8) 

(9) 

1810 i 


- 1 

7 2* 


0 n.HltKr 

i 41283(1 

25 87 

20 87 

1820 ! 
1830 1 

JTO 

Q 

9 C 

9 6< , 

0 

1 274670 

18 82 

19 82 


1 

12 9 


0 138160 

1.136510 

13 69 

14 69 

1840 ; 


2 

17 1 


0 276319 

0 99S351 

9 902 

10 962 

18.j0 ; 


3 

2.3 2 


0 414479 

0 860191 

7 248 

8 248 

1860 j 


( 4 

1 31 \ 


0 552038 

0 722032 

5.273 

6 273 

1870 , 


I 

1 39 8 


0 690798 

0,. 58.387 2 

3 836 

4 836 

1880 ; 

/ . 

' 0 

i) ^ f 2 

50 2!v. 

0 HI'.'SOGH 

0 445712 

2 79 1 

3 791 

IKOO 1 


7 

i 02 9 

0.907117 

0 307553 

2 0.30 1 

3 030 

1900 


1 H 

: 70 0 


1.105277 ! 

0 169393 

1 477 

2 477 

1910 


! 9 

1 92 0 


1 243430 ! 

0 031234 

1.07.5 1 

2 075 

1920 


• 10 

' 1 05 7 


i 1 381.596 

-0 106926 

0.7818 

1 7818 

1930 


1 

1 122 8 


1.519750 

-0 245086 

0.6687 

1.5687 

1940 

Xi 


131.7 

134 6rv,j 

1.65791.5 

-0 383245 

0 4138 

1 4138 

1950 


1 n 

1 150 7 

1 796075 

-0 521405 

0 3010 

1 3010 


ino 293 

y,. - ~ 

i t /I 
flO) 

' 7 \ " 

9 «s/ 

13 0 
17 4 
23 I 
30 3 
39 3 
MJ 2V 
62 8 
76 8 
91 7 
106 8 
121.3 
134 6>/ 

146 3 


Data from U. 3- Bureau of tho Census, U. 3. Census of Population: 19/iO, Vol. 1, Number of Inhabi' 
tants, p. 1“3, Table 2. The revLscU population figure is shown above for 1870. The y value* of Column 
5 are geometric means of three values centered at zo, Xt, and xi. The negative logarithms in Column 7 
must be rewritten in their alternative forms witn negative characteristic and positive mantissa (e.g. 
— 0.106920 — 9.893074-10) before the values of m ctn be obtained. 
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Chart 13.13. Population of Continontal United Stales, 1810-'1950, and 
Trend as Shown by n Logistic Curve. The logistic curve has been extended to 
show the general shape of the curve. Note that this chart has a reciprocal vertical 
scale and that, owing to the compression of the upper part of the scale, the curve of 
the observ^ed data and the trend line virtually coincide. Data of Table 13.6. 
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Xq to Xi or from Zi to x^. Using the i/o. Vu and ^2 values shown in Table 
13.6, we obtain the values of fc, a, and b as follows: 


, 2yayiyi - y\{ya + yi) 

k = — ■ — -5 1 

j/oi/2 - yi 

^ 2(9-6)( 50.2)(13t.(i) - + J34.6) 

(9.6) (134.6) - (5(r2)^ 

= 190.293. 


a = log 


k - ya 

~y'r 


, 190.293 - 9.6 , 

= log = log 18.822188, 


9.6 


= 1.274670. 


6 - - 

n 

_ 1 
~ 6 


log 


yojk - yi) 

yiik - j/o) J 


, 9.6 (1 90.293 - .^.2^ 

50.2(190.293 - 9.6) 


= log 0.14826036, 
6 


= - (9.17104244 - 10) = - ( -0.828957.56), 
• 6 , 6 

= -0.1381596. 


Trend equation: 


^ 190.293 

^ ~ J JQ(1.27.1670- n.l3r.l.o9f>.A ) 

Origin, 1820; X units, 10 years. 


The computation of the trend values for this logistic (filiation is shown 
in the last five columns of Table 13.0. The procedure consists first of 
writing 

fx = 10“+^^ 

so that 




1 -h/x 


In our equation, 


fl == 10(1 274670-0.188169621) 
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and 

logM - (log 10)(1.274670 - 0.1381596X), 

- 1.0(1,274070 - 0.1381596X), 

1.274070 - 0.138150GX. 

The values of ix are obtained in Columns 6, 7, and 8 of Table 13.6. In 
Column 9 of this table, the values of 1 + M shown, and the Yc values 
are gotten in Column 10. A check on the computations may be had by 
comparing the T. values for 1820, 1880. and 1940 with the values of 2 / 0 , 2 / 1 , 
and 7 / 2 , since the curve must pass through the three selected points. The 
check marks in Column 10 of Table 13.0 indicate that agreement is 
present. 

The trend values have been ploited in Charts 13.12 and 13.13, and the 
trend has been extended in both directions to show more clearly the 
fundamental shape of the curve. Note that the agreement between the 
observe'^ data and the trend is so close that the two can hardly be dis- 
tinguished. Note, too, that Chart 13.13 uses a reciprocal vertical scale, 
and that in this chart the logistic curv'e is similar in appearance to the 
modified exponential curve - 

The logistic curve was mentioned in 1838, and later discussed more 
fully, by P. F. VerhuLst. In 1920 it. was developed independently by 
Raymond Pearl and Lowell J. Reed. It is not infrequently referred to as 
the Pearl-Reed cur\'e. Pearl and Reed have used the curve to describe 
the growth of an albino rat and of a ladpole\s tail, the number of yeast 
cells in a nutritri^e soluti(m, the number of fruit flies in a bottle (on a 
liPiited food supply), and, most uiU] . Ting of ah ^he number of human 
beings in a geographical area. In each case, the 1 ienomenon measured 
is population growth, either the number of cells in an organism or the 
number of individuals in a region. The law of growth which the logistic 
curve describes is stated by Pearl as follows:^'* 

In a spatially limited universe the amount of increase which occurs in 
any particular unit of time, at any point of the single cycle of growth, is 
proportional to two things, viz. • (a) the absolute size already attained at 
the beginning of the unit interval under consideration, and (b) the 
amount still uiiuse<i or unexpended in the given universe (or area) of 
actual and potential resources for the support of growth. 

In the case of human populations, new development may expand the 
available subsistence and allow a new cycle of growth. For instance, 
mankind may pass through a hunting stage, an agricultural stage, and an 


Raymond Pearl, The Biology of Population Growth^ Alfred A. Knopf, New York, 
1925, p. 22. See also Raymond Pearl, Introduction to Medical Biometry and Statistics ^ 
W. B. Saunders Coi^pany, Philadelphia and Tiondon, 1940, Third Edition, p. 459 f. 
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industrial stage. Each cultural epoch may then be described by a new 
logistic curve spliced onto the old one. Thus, 

= fci + 

describes a curve in which /ci is the new lower limit and ki + fcj the new 
upper limit. In this equation, fci is below the upper limit ko of the 
previous logistic and indicates the value at which the previous one was 
interrupted. 

Apparently waves of immigration and human institutions do not 
change the fundamental shape ot the curve, although they may modify 
the steepness of its slope somewhat. Also, the growth may not be sym- 
metrical: the point of inflection need not he halfway between the upper 
and the lower asymptotes, nor need the two parts of the curve be of the 
same shape. A skewed logi.stic may be ohtaineil by a slight modification 
of the previous formulae, by writing 

The theory advanced by Ra^unond Pearl is ih>l, however, universally 
accepted. Some argue that, although the logistic curve is appropriate 
enough for fruit flies in a bottle, its extension to human soinety is unwar- 
ranted. Human beings have, and exerci.se, the power of modifying thcir 
environment and rationally controlling their rate of reproduction. 

One use to which the logistic curve is sometimes put is to torecast the 
size of the future population. Forecasts ba'^ed merely upon the exlen.sion 
of a curve are of dubioUvS value, since tney assume no itaporlant change.^' 
in any ot the underlying influeiices on a senses. The extended trend 
value of our logistic curve for HIBO is ir>(>. I million, which is almost cer 
tainly too low. A trend such as we have filt(;d may also bc‘ ii.sed to esti 
mate population for earlier years, when reliable records did not exist. 
Thus, the population of what is now the continental Pnited StaUiS may be 
estinuUed from our equation to have been about 2.8 million in 1780. A 
better estimate for 1780 might have been i:*htained if we had includerl 
1790 and 1800 when determining the constants for the logistic equation. 

Comparison of the Gompcrlz and logistic curves. The Gompertz 
and logistic curve.s are similar in that they both can be used to describe an 
increasing series which is increasing by a decreasing percentage of growth, 
or a decreasing series which is decreasing by a decreasing percentage of 
decline. They diiTer in that the CJompertz curve involves a constant 
ratio of successive first differences of the log F,; values, while the logistic 


See footnote 5 in Chapter 5. 
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curve entails a constant ratio of successive first differences of the ~ 

* c 

values. 

For the types of series to which we are interested in applying these 
curves, both have upper and lower asymptotes. 

The first dilTcrences of the trend values of a Goinpert/ curve form a 
curve resembling a skewed freciucncy distribution, as showui in part A of 
Chart 13.14, ^Phe first difTeren. es of the trend values of a logistic curve, 
of the typ(^ discussed here, form a curve resembling a normal frc( 4 ueucy 

MIU IONS 
OF POUNDS 



('.hart i3.14A. First DifTrrciirc^s t>f the 'I’r^'nd Valuer of Donicustir 

(kinsuniplioii of Hajoii Fih? Yarn 1915 2055. 


MU-UONS 
OF Pe»?S0NS 



Chart 15.14D. First OilTereiieeh of the Lo^n^tie '['rend Values for Population 
of Continental United States, 1770-2070. 
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distribution (see Chapter 23), as shown in part B of Chart 13.14. Because 
of this characteristic of the logistic curve, observed data are sometimes 
plotted on arithmetic probability paper^® (see Chart 23.9 and the accom- 
panying discussion) to see if the trend appears to be a straight line. If 
so, the logistic curve may be fitted. 

When plotted on semi-logarithmic paper, the Gompertz curve has the 
appearance of a modified exponential curve; when plotted on a grid with 
a reciprocal vertical scale and an arithmetic horizontal scale (alterna^ 

tively, and A" may be plotted on arithmetic paper), the logistic curve 

* c 

has the appearance of a modified exponential curve. 

SELECTING A TREND TYPE 

This, and the preceding chapter, have not attempted an exhaustive 
treatment of the types of trends that may be utilized. However, a suffi- 
cient variety has been given to meet most of the needs for time series 
analysis. With such a large number of trend types available, how can 
one decide which to use? First, the trend type should be compatible 
with the behavior of the forces which we seek to measure. If the object 
is solely to obtain cyclical deviations, the trend should pass through the 
approximate center of each cycle. If it is desired to extend the trend for 
purposes of forecasting, the trend and its extension should "conform to 
expectations dictated by logic. If, for instance, the scries is such that it 
may logically be expected to flatten out, an asymptotic curve should be 
selected. Wlien JLhe objective is solely historical study, the future behav- 
ior of the curve is not so important. 

The first step in deciding what trend type to use should always consist 
of plotting the observed data on arithmetic paper and then, if the trend 
is not linear but either (1) upward and concave upward or (2) downward 
and concave upward, on semi-logarithmic paper. Examination of the 
plotted data will frequently provide an adequate basis for deciding upoi\ 
the type of trend to use. When further guidance is needed, an approxi- 
mate trend may be drawn by inspection and the following tests applied to 
the smoothed curve: 

1. If tlie first differences tend to be constant, use a straight line. 

2. If the second differences tend to be constant, use a second-degree 
curve. 

3. If the first differences tend to decrease by a constant percentage, use 
a modified exponential. 

'♦This involves: (1) as3\iming an asymptot/C and (2) expressing the observed data 
as percentages of the asymptote, before plotting. More than one asymptote may be 
tried. 
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4. If the approximate trend, when plotted on arithmetic paper, is a 
straight line, use a straight line. 

5. If the approximate trend, when plotted on semi-logarithmic paper, 
is a straight line, use an exponential curve. 

6. If the approximate trend, when plotted on semi-logarithmic paper, 
resembles a modified exponential, use a Gompertz curve. 

7. If the approximate trend, when plotted on a grid with a reciprocal 
vertical scale and an arithmetic horizontal scale, resembles a modified 


exponential, use a logistic curve. 


on an arithmetic grid. 

8. If the first diff crences reseinble a skew od frequency curve, use a Gom- 
pertz curve, or a more complex logistic curve than the one described here. 

9. If the first ditTereiices resemble a normal frequency curve, use a 
logistic curve. 

10. T*, first differences of the logarithms are constant, use an expo- 
nential curve. 

11. If the second differences of the logarithms are constant, tit a second- 
degree curve to the logarithms. 

12. If the first differences of the logarithms are changing by a constant 
percentage, use a Gompertz curve. 

13. If the first differences of the reciprocals are changing by a constant 
percentage, use a logistic curve. 

14. If the approximate trend values (or the original data), when 
expres.sed as percentages of a selected asymptote, ippear linear on arith- 
metic probability paper, use a logi^^tic curve. 


Series are sometime^s encounicred which appear to have had a trend of 
one type during one part of the period and a difierent trend of the same, 
or a different, type during another part of the period. Changes in trend 
are most likely to have occurred during the 1930’s. 

Rarely, several trends, eacdi having the same number of constants, 
appear equally .suitable for a series of data. In sut:h an event, that one 
is to be preferred from w hich the squared deviations of the Y values are a 
minimum. In making such a comparison, ’.urves fitted to Y values 
should not be compared with those fitted to log F values. 

Occasionally, none of the previou \y mentioned aids wdll enable one 
to decide what trend type to use. This may be because^the approximate 
trend was not properly selected. Or, it may he that the series does not 
conform to any simple matheinalical description. In a dynamic wmrld, 
the forces in operation are seldom allowed to wmrk out their full effects 
before other factors make themselves felt. As a result, any trend type 
may be appropriate for only a relatively sliort period. 



CHAPTER 14 


Analysis of Time Series: 

PERIODIC MOVEMENTS I - CONSTANT 
SEASONAL PATTERNS 


As indicated in Chapter 11, there are many types of periodic move- 
.ments, including those that repeat themselves daily, weekly, monthly, or 
annually- In this chapter most attention will be given to those monthly 
movements within a year commonly known as seasoyta! movements 
The principles laid down can easily be applied to the various other types 
of periodic movements. It will be the plan of this discussion to start 
with data which lend themselves to very simple treatment, and gradually 
to introduce more complex methods as they are re({Liired. (kTisideration 
of seasonal movements that vary in their pattern from year to 3"ear will, 
however, be reserved for the'following (iiapter. In general, all of the 
methods involve averaging, in some manner, the values of the dilTerenl 
Januaries, then the values of the different Februaries, and vso forth, but 
differ chiefly in the degree to which the data arc refined l>cfore being 
averaged. 

AN INTRODUCTORY ILLUSTRATION 

Averages of unadjusled data. When the data do not contain cyclical 
movements or trend to any appreciable extent, it will suflicc to average 
the data without making any previous adjustment. An illustration of 
such data is the number of books issued and renewed for home use at the 
main loan desk of the Columbia University Libraries during the 1952 - 
1953 winter semester. The data are shown in Table 14.1, from whieli 
were excludcxl those weeks in which a lioliday occurred and also the weeks 
before final exanpinntions, the week before the Christinas vacation, and 
the week before the Xovember 4, 1952 presidential Election Day holiday. 
Below each column of data is given the average of that column. The 
averages, one for each day of tLe week, constitute a measure of the intra- 
week fluctuation in circulation of bor>ks. p'or convenience, however, it 
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may be desirable to express this measure in percentage form. By 
dividing each of the six daily averages by the average of those six averages 
(which is the average per day for the entire period), and expressing each 
of the six daily averages as a percentage, we obtain the index shown in the 
last row of Table 14.1. 


TABLE 14.1 

Computation of Index of Intra-Week Variation, Using At>erages of t/;i- 
adjusted Data, of the Number of Books Issued and Renewed for Horne 
Use at the Main Loan Desk of the Columbia University Libraries, 
Winter Semester, 1952-1953 


Week 

beginning: 

Mon- 

day 

Tues- 

day 

Wednes- 

day 

Thurs- 

day 

Fri- 

day 

Satur- 

day 

Average 
per day 

Sept. 29 

641 

5:13 

661 

487 

513 

3(34 

499 8 

Oct, 

6 

674 

659 

524 

600 

032 

300 

529 8 

Oct. 

13 

710 

476 

641 

697 

566 

337 

.554 3 

Oct. 

20 

669 

484 

640 

643 

500 

376 

517 0 

Nov. 

10 

.676 

496 

645 

055 

586 

303 

536 8 



720 i 

692 

603 

026 

, 561 

633 ! 

! 606 8 

Dec. 

i 

606 1 

539 

548 

5v>l 

645 

464 ’ 

544.3 

Dec. 

8 

701 i 

601 

; 550 I 

1 635 

1 759 

422 i 

611.3 

Jan. 

6 

702 

566 i 

518 1 

I 651 

1 617 I 

4si; i 

593.2 

Arithmetic 

mean 

'otTo’ 

538 2 I 

562 2 

r’er'iri 

I 575 1 , 

~405 o' i 

55 V 7 ' ‘ 

Index... . 


121 0 

I 1^7.0 1 

i 101 4 

1 103 9 

1 103 7 j 

73 0 1 

100.0 


Data from Cirt’iilatujn Dcpartnumt. C'olumbm Univoraity Librarii‘8. Excluded aie weeks in 

which a holiday occurred and also the week before final exam. nations, the week before tlie (Uiristmas 
vacation, and the week before the Nov. 4 lfto2 presidential Ele»'tion r)a> holiday. 


Percentages of simple averages. A glance at the data of average 
circulation per day for the nine weeks, shown in the last column of Table 
14.1, makes it clear that activity w-^t greater iu ome weeks than in 
others. The procedure which was followed in IV .e 14.1 allow^ed the 
weeks of larger circulation to exert more weight on the daily averages, 
and thus on the index, than that exerted by the Aveeks of smaller circula- 
tion. It might be thought offhand thal. vsuch extra weight is highly 
desirable, but, it must be remembered that we are trjdng to determine a 
typical pattern, and it does not necessarily follow that weeks of large 
circulation are weeks having a typical pattern. If the figures for each 
day of a given week are expressed as percentages of the average for that 
week, as in Table 14.2, each w^eck will be of ei.;urJ. importance in deter- 
mining the index of intra-week variation. Furthermore by putting the 
data into percentage form, we can m, e readily detect erratic variations 
from the typical weekly pattern. A study of such percentage data for 
each day may lead one to select some average other than the arithmetic 
mean. Thus, in the present instance, the percentage data of Table 14.2 
have been put into arrays in Table 14.3 and in Chart 14.1. It is clear, 
from Chart 14.1, that a periodic movement is present. It is clear, too, 



PER 

140 


Chart 14.1 . Arrays of Perceiilapes of llaily Avt-rapes for Kach Week 
for rS’umher of Books Issocil ami Reiiew<‘d for Home Vse rI flie Main 
l.oaii Desk of the Coliiitibia l^iiiversit> l^ihraries, Vi iiit€*r Semester, 
1952-1953. Data of Table 14.3. 
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i!!harl 14.2. Imlexes i>f Inlra-Wciek \'uriatioii of Niiniher of Rookn 
Issued and Renewed for Horne Use at the Main Koan Desk <»f the 
Columbia University Libraries, Winter Scinestcr, 1952 1933. Data 
from Tables 14.1 and 14,3. 
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TABLK 14.2 

i*ercentages* qf Daily Arerafies Jar Each Week f(*r IS’uniher of Books Issued 
anfl Renewetl for Home Use at the Main Loan Desk of the Columbia 
Unii'rrsity Libraries^ Winter Semester, 1952-1953 


(Thp daily for farli wook are shown ir. the last <'t)luiriii of I'atjle M.L) 


Week 

heginning: 

Monday 

Tuesday 

Wodne.s- 

day 

Thuisrlav 

Friday 

Saturda. 

Sept. 

21) 

lOS 

2 

lOfi 6 

112 

2 

07 

■1 

102 

0 

72 

8 

Oct. 

() 

127 

2 

105-5 

'J8 

‘0 

114 

4 

100 

4 

56 

(i 

Cot. 

i3 

128 

1 

85.7 

!I5 

tj 

107 

7 

102 

1 

00 

8 

Ocl,. 

20 

J27 

f) 

03 fi 

104 

•I 1 

105 

0 

0t‘) 

7 

72 

7 

i\ov. 

10 

J07 

3 

i’ 02 . -1 

lOi 

5 . 

122 

0 

100 

2 

07 

0 

N ov. 

17 1 

118 

8 

i fi7.7 1 

00 

r. 1 

103 

3 

02 

0 

88 

0 

Dee. 

1 i 

122 

i 

00 0 ! 

100 

7 1 

02. 

0 

100 

1 

85 

2 

Dee. 

8 i 

id 

7 j 

08 3 i 

00 


103 


124 

1 

00 

0 

Jan. 

5 1 

133 

5 j 

or>.3 1 

02 

1 1 

02 

‘ 

0)4 

0 1 

81 

9 


^ row' a'’eiaK'\s lOO.O. 
Rased on data of d'ablo 14 1. 


TABLt: lO 

i'ompu lution of Index of Intra-'Ueek J’^ariatinn, t\sinfi Perventa^es of the 
Daily I veraae for Laeh tf eek, ipf the ^umh€*r of Books^ Issneil and He- 
newcfl for Home ( se n( the Mein Loan Desk of th*^ (Columbia Uni- 
versity Libraries, H in ter Semester. 1952 J953 


liank 

Mon- 

flav 

Tm‘s- 

U.M\ 

day 

j 'Duirs- 
1 day 

Fri- 

day 

Satur- 

day 

Aver- 

ago 

1 

133 5 

I0t> 0 

115 0 

122 0 

124 2 

88 0 


2 

12S 1 

105 5 

112 2 

i 111 1 i 

1(H 2 

85 2 


3 

i27 5 

00 0 

101 1 

! 107 " 

Hi4 0 

81 9 


4 

127 2 

08 3 

lui 5 

, 105 0 

)2 G 

72 8 


5 

122 1 

07 7 

100 7 

1 103 0 1 

-12 1 

72 7 


0 

118.8 

1 o;> 3 i 

00 5 

; 103 3 1 

10(1 4 i 

00 0 


7 

114 7 

1 03 0 ! 

08 0 

! 07 i 

100 1 

07 6 


8 

1 

K)S 2 

02 1 1 

02 4 

j 02 0 : 

00 7 

()0 s 


9 ! 

107 3 

85 7 ; 

00 0 

1 02 0 : 

02 fi j 

5ii 0 


Mean of middle seven i 

121 0 

07 l"i 

101 4 

103 r 

102 2 1 

72 0 

00 7 

Ind(‘X .... 1 

121 4 i 

07 7 ! 

lOl 7 

1 103 4 i 

102 5 1 

73 1 i 

lOO 0 


Data of T'aMe 14 Id. 


that there arc a few extreme vahiOvS wliieh du iiut fit into the general 
pattern. The effect of such extremes can he greatly decreased by using 
the median for each day; or, the ext. me values (;an be eliminated by 
using the aritlimctic mean of a central group of values for eac'h day. In 
Table 14.3 the average of the middle 7 values for each day is shoAvn.^ 

^ If the reader will compute an index using the median for each day, or the mean of 
the middle five values for each day, he will find that the six values will differ only 
sliglitly from those sh>< -n in Table 14.3. 
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Since these six figures are modified means, they do not average exactly 
100.0 Instead, they average 99.7 and are adjusted to average 100.0 by 
dividing each of them by 99.7 and multiplying by 100 to obtain the index 
shown in the last row of Table 14.3. The indexes of Tables 14.1 and 14.3 
are shown in Chart 14.2. They do not differ greatly, because the nine 
weeks are not greatly different in importance. 

SEASONAL INDEXES OF MONTHLY DATA 

A sea.sonal index, showing the typical intra-year movement of a series, 
is ordinarily based upon monthly data, but such an index may be con- 
structed from weekly^ data. While a seasonal index could be made from 
daily data, the index would be likely to reflect intra-month and intra- 
week movements as well as seasonal variations. In this text we shall 
limit our attention to seasonal indexes obtained from monthly data. 

Before setting out to compute a seasonal index, one should be sure that 
a seasonal movement is present in the series. This may be apparent from 
experience w’ith the subject matter represented by the data. In the ca.se 
of the book-circulation data of Table 14.1, the librarians knew' that intra- 
w'oek variations were present, so no preliminary examination of the data 
w'as necessary. Similarly, the reader know's that seasonal variations 
exist in the consumption of ice cream, the \ise of gasoline, department 
store sales, and in various other series. However, the investigator may 
not ahvays know' if the series in which he is interested has a seasonal, and, 
unless he assures himself that a .seasonal movement is present, it is con- 
ceivable that lie might perform the extensive calculations to bo described 
later and learn at the very end of his work that his index figures were all 
approximately 100.0. 

To ascertain if a seasonal is present in a series, it will usually suffice to 
draw a curve of the data such as the lighter lino of Chart 14.4 or to make 
a chart like Chart 14.5. In some instances, it may not be possible to be 
sure there is a seasonal movement by examining charts of the raw data 
and it may be necessary to proceed far enough with the analysis to make 
charts like Charts 14.1 and 14.7, Occasionally charts such as Chart 15.2 
must be constructed before a decision can be made. 

A seasonal index based on percentages of trend. If a series of 
monthly data exhibits secular trend, a seasoTial index computed by cither 
of the simple methods previously described will have an upwnird or down- 
ward bias, depending on the direction of the trend. Thus, if the trend 
w'ere upward and linear, each Deceml)er wnuld he higher than the pre- 
ceding January by an amount equal to of the annual grow'th, even if 
there were no genuine sea.sonal mo /ement present. Because of this fact. 


* The procedure i« described on pages 628-538 of the first edition of this text. 
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the seasonal index, which is supposed to exhibit seasonal movements only, 
would slope upward; and, if there were a true seasonal movement, the 
December index number would be too high relative to the January index 
number by yi of the annual growth. Of course, the trend may not be 
upward and linear. It may be downward and linear, in which case the 

•nt 



Chart 14,3. Index of Typical Seasonal Variation *•: f Jfc Insurance Death- 
Benefit Payments in the United States. From the L<'«.'’8.on of Statistics and 
Reseurch of the Institute of Life InMiiunte Tlu' index ipp.e^entjs uvcM-agcs of the 
ratios of actual payments to trend values, the trend having been fitted to monthly 
data for 1942 through 1951. 

December figure would be too low. If the trend is non-linear, its effect 
on a seasonal index computed as in Tabic 14.1 or Table 14.3 cannot be so 
simply stated, but the effect is present and is often pronounced. 

The first really useful procedure for computing a seasonal index was 
designed to overcome this difficulty and was based on per-cent-of-trend 
data. In this method,* the first step consists ot determining a trend 
equation for the data and obtaining the monthly trend vai jes. Next, the 
original monthly data are expressed as percentages of the monthly trend 
values. These percentages are put into a table like Table 14. 3 but having 
12 columns, one for each month. The seasonal index is then obtained 
from twelve monthly medians or modified means just as in the last two 
rows of Table 14.3. 

* It is sometimes referred to as the Falkner mtlhod. See "The Mefisurement of 
Seasonal Variation," by Helen D. Falkner, Journal of the American Statistical Asaocia- 
tion, June 1024, pp. 107-179. 
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The per-cent-of-trend method ignores the disturbing effect of cyclical 
ups and downs. The highs and lows of cycles would appear as extreme 
dots in a chart like Chart 14.1 but which would have twelve arrays 
instead of six. This method depends upon the averaging process, that is, 
upon the use of the median or a modified mean, to eliminate the effect of 
cyclical highs and lows. At present, it is not a widely used method, but 
it may be applied to series having cyclical movements which are unim- 
portant relative to the sc'usonal movements. Such a series is the payment 

THOUSAMOS Of 
SHORT TONS 



Chart 14.4. (loiisuniplioii <»f Neusprint l»y I’nilcul Stnles Piil>Iislit*rM, 
Januar\ 19t3 Juiu* aiiU TueI\e-Moiilli .Movinj^ Avi'rapt*. 

Data of Tabl<‘ 1 t.o 


of life insurance death henefits in the Tnited States, and Cliart 14.3 shows 
the seasonal index for this .series computed by the ratio-to-lrend method. 

Percentages of centered 12-montli moving averages. The data 
which we shall use to illustraU* the determination of a seasonal index, 
which does not change fi om year to year, have to do with the consumption 
of newspiint by United States publislicrs. (4iarts 14.4 and 14.5 make it 
clear that a .seasonal movement is present and that it is approximately 
the same from year tf) year. CJhart 14.5 may be termed a “ year-over- 
year chart, since each year is arbitrarily placed above the preceding 
year; the curve for each year has been plotted to the same vertical scale, 
but at a different level. 

The data of newsprint consumption have not been adjusted for calendar 
variation. The reason for not making this adjustment is that the pub' 
lished data are not so adjusted. If a seasonal index were to be made from 
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A B 

('hurt 11.5, \ear-Ovt‘r-Year ('liarts of; (V) (vOJisiinip- 
lion of Nouspriiil nrii! (11) Per«*onlanes of 'I'welve-Alonth 
Mo\iiig Avorupe, IMll 1952. L);ita of Tjihlo i Lo. In t ;m h 
part of tlio f'hfirt, tlio oin\o for oacli \oar is plraa'il jnst ahovo Uio 
(Mirvo foi tlio jiroooihnji year. Tins l^ a(*f(>ni})]p4joil hy usin{i tho 
flame vortieal scali' for each of tlie lune carves, but raisinj^ or 
lowering;; the .scale, a.s riece.ssary. 


the data adjusted for calendar days, then all monthly figures, mcliiding 
new ones as they appear^ would have to be adjustetl befmv they could he 
compared to the typical seasonal movement. Users of such data are, not 
infrequently, more interested in the monthly figures than in the per-day 
figures, the length a month being sometimes thought of as contributing 
its part toward the typical seasonal variation. The procedure for com- 
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puting an index of seasonal variation is the same whether the data have or 
have not been adjusted for calendar variation. One adjustment, how- 
ever, has been made: February 1944, February 1948, and February 1952, 
each of which had 29 days, were adjusted to a 28-day basis. 

The perccntage-of- 12-month-moving-average method, which is ordi- 
narily referred to merely as the pcr-cent-of-moving-average method (or 
just moving-average method) is in wide current use. It differs from the 
per-cent-of-trend method only in that the original data are expressed as 
percentages of the moving average instead of as percentages of trend. 
Computing the centered 12-month moving average involves more work 
than does the determination of trend values, but the resulting seasonal 
index is a better one. This is so because the moving average is a fairly 
good estimate of trend and cyclical movements combined. 

A 12-raonth moving average is a series of averages which embraces, 
first, the first 12 months of a series; next, the second to thirteenth months; 
then the third to fourteenth months; and so on. To be more specific, let 
us consider the data of newsprint consumption by United States pub- 
lishers, shown in Table 14.4. The first figure for the 12-month moving 
average is the average of the first 12 months, January 1943 December 
1943, In Column 4 of the table this is seen to be 226.68. Note that, 
being the average of the 12-month period January-Deceinber 1943, this 
figure is centered between June and July 1943. The second moving- 
average figure, 224,02, covers the period February 1943-January 1944 
and is centered between July and August 1943. Each figure in Column 4 
of Table 14.4 is flie arithmetic mean of the six original figures which pre- 
cede it and the six original figures which follow it. 

Since the figures in C’olumn 4 of Talde 14.4 fall betw^een each pair of 
months, while the original data in Column 2 are for calendar month.s and 
are centered at the middle of each month, it is necessary to adjust the 
moving averages so that they will be in step with the original data. This 
process is called centering^ and involves computing a two-month moving 
average of the r2-month moving averages. Columns 5 and 6 of Table 
14.4 show how this is done. The result is a series of moving averages, 
properly centered and beginning with July 1943. These moving averages 
have been plotted in Chart 14,4. 

^ Some fitatisti l iana do not bother to center a 12-month moving average, but 
arbitrariJy plru'o the average for each 12 months opposite the seventh month, con- 
tending that tlic loss in accuracy ia more than offset by the saving in time. If a 
centered l2-mf)nth moving average is computed by the method described on the 
following pages and illustrated in Table 14.5, and if a mask is used to obtain the 
moving totals (see F. K, Croxton, Workljt>ok in Applied General Statistics, Prentice- 
Hall, Inc., New York, 1050, third edition, p. 95), the centered 12-month moving 
average can be obtained almost as quickly as can the uncentered 12-month moving 
average. 
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TABLE 14.4 

Computation of Centered 12-month Moving Average for Consumption of 
Newsprint by United States Publishers, January 1943'-June 1953 


Year and 
month 

( 1 ) 

1943 

January . . 
February . . 

March 

April . . 

May 

June 

July . . 
August 
September 
October . , 
Novemb^'r 
Decombiir . 

1944 
January 
February . 
March 
April . . . 
.\lay . . 
June . ... 
July... . 
August 
September 
October. . 
November 
December 


January . . 
February 
March . . . 
April . . 

May 

June. . . . 

July 

August 

September 

October . . 

November 

December 


Consumption 
(thousands 
of short tons) ; 

( 2 ) 


12-month 

moving 

total 


12-month 
moving 
average 
Col 3 + 12 
(4) 


2-month 

moving 

total 


Centered 12- 
month moving 
average 
Col. 6 + 2 
(6) 


2,720.2 
2,688 2 
2,656 3 
2.620 9 
2,578 7 
2,527.8 
2,490 5 


2,453 1 i 
2,418 4 j 
2.385 3 I 
2,367 9 I 

2.357.2 

2.344 8 ! 

2.335.3 ! 

2.334.2 i 
2,335 3 ! 

2,337 4 I 

2.345 8 I 
2,315 2 I 


4,513.9 
4,510 2 
4,507 3 
4,505.6 

4.526.3 
4,640.5 

4.539.3 
4,546.8 
4,555.2 
4,676 9 
4,592 3 
4,617 8 
4,019.1 


226.68 
224.02 
221 36 
218 41 
214 89 
210.65 
207 54 


204 42 
201 53 
198.78 
197 32 
196 . 43 

195.40 
194 61 
194.52 
194 61 
194 78 
195.48 

^95J3 

376.16 

376.85 

375 61 

376 47 

377 19 
378.38 
378.28 

378 82 

379 60 

381 .41 
382 69 
384 82 
384.92 


! 450 70 

225.4 

i 445 38 

222 7 

1 439 77 

219.9 

1 433 30 

216.6 

I 425 54 

212.8 

418 19 

209.1 

411 96 

206. 0 

405 95 

203 0 

400 31 

200.2 

396 10 

198.0 

: 393 75 

196 9 

i 391.83 

195 9 

j 390.01 

195.0 

! 389.13 

194.6 

389 13 

194 6 

389 39 

194 7 

'>90 26 

195 1 

91 1 

195 5 

1 

752 01 

376 0 

751.46 

375.7 

751 08 

375.6 

752.66 

376.3 

755.57 

377.8 

756 66 

378.3 

757 . 10 

378.6 

758.42 

379.2 

761 01 

380.6 

764 10 

382.0 

767 51 

383 8 

j 769 7 i 

384.9 


1963 

January.. . 351.8 

February . . . 346 .0 

March 421 .0 

April 408 9 . , . . 

May 429 ,6 

June ? ' ■ - 

Data from II. S. Depaitment of Commerce, I9fl3 Biennial Edition, p. 179; 1951 

Biennial Edition, p. 178; and 1947 Statiatical Supplement to ihf Survey of Current Bueuiese, p. 160. 
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It is clear from Chart 14.4 that the <*entered moving-average (igures do 
not reflect, to any appreciable tl(‘gree, either the seasonal movement or 
irregular movements. It is not so clear, from ( 'hart 14.4, that the moving 
average follows, approximately, the comhiiu^d trend and cyclical pattern, 
since there is little cyclical movement in the series of ncws])t‘int consump- 
tion during the period under consideration. That a centered ri-rnonth 
moving average does, indeed, describe the approximate trend and 
cyclical movements^ may b(' observed more satisfactorily in (4iart 15.1. 

Before proceeding with the computation of the seasonal index for news- 
print consumption, it ^\i!l l)e well to look again at Taf>Ie 14.4 and to note 
that the pro(‘ednres imli^ ated in that table are more laborious than ne(axs- 
sary. We do not need to <‘ompuio the moving average of (^)lumn 4. 
We could, instead, compute a two-nunith tnorifig lofal of the figures in 
Ck)lumn 3 and then divide each of these totals by 24 to obtain exactly the 
same figures as are shown in C’olumn G of Table 114. 'There is, however, 
an even more exjieditious f)rocedurt\ whi(*h we sliall employ. Consider 
the centered mo\ing average for July 1043. d’liis figure was obtainetl 
by totaling the value for Januaiy 1013, tirirc tln^ value for February 1043, 
tivia^ the value for each of tlie following months thiough Dca-f^mlau 1043, 
and the value fiu- .lanuary 10-1 and dividing tlnr> tx'tal by 21. Similarly, 
the average bu* August 10 13 i> (he re-.uU of dividing by 21 the sum of: the 
February 1043 value, twi(‘c each of (lie next 11 values, and the value for 
February 1014. In otlnu' words, what wo ha,ve actually clone in com- 
puting a conten^d l2-montlv moving average i.s to compute a 13>monih 
moving average with the months weighted 1 . 2, 2, 2, 2, 2, 2, 2, 2, 2, 2. 2, I . 

Tabic 14.5 showxs the comjiulMtion of the weighted l3-rnonth moving 
total and of the 12-monlh c(‘nteied moving avaa-age. 'The procedure is 
as follow’s: 

1. losing an adding marhine, coin}')tite the weighted 13-inonth moving 
total for July of each ytair and also the last moving total, w'hi(‘h in 'Table 
14.5 is for 1 )e('ernl>er 1952. JJie total for each July will include values 

^ When a s M;o'.vs prononneod cyclical mov(*incrits the ct'ntcrod l2-inonth 
moving average inav not rnova- high ^*noiigh into t)ic cyclical peaks or low (aiough into 
the cyclif'fjl lov. s. P shonhl clear wliy this is so, .since-, when a centered 12-raorith 
moving fiviT.'xge is cr-nterc l .tt s cychcal high point, the average would be inlluenced 
not only by thf‘ \ahic for the middle month, but also by thi* six preceding and the six 
following months, all or irujst of uhn h woulil have vahie.v lower than that of tlio middle 
month. The revi'r.'-e wmild be true wlien the moving average is centered at a cyclical 
low point. B(‘cause of the fon-going, .some statisticians smoolli and alter the moving- 
average curve, M-iiially by a freehand pna^-ss, to o})tain what is beliexeil to be a better 
estimate of the cornbim'd trend and cv(*hcal movemeuts The original values arc 
tlien nxpres.s(‘<l us pen^ ntages of the values on llus new eiirve. Ser*, for example, 

Adjustment for Siasonal Vafiation,” by ’b C. Barton, Federal lieseri'e Bulletin^ 
June 1941, pp. 518 52S. 
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TABLE I4.r> 


Short MethofI of ilomputiof* C.entvred I'Z-tnonth Moving Art*rnf'e and Per- 
cenlagefi of Movitif* Arvrof^o for Cottsuniption of /Newsprint by (’niu^d 
States Puhlisfiers^ Janmiry V) 13 ~J ant* 1^)33 






r"cutoi€sl 12- 

iVr c.'nt of 


('r>jj*.>jnijihon 

l3-;iioi)th 

I.jOIlt 1) 

J2-iiiujjth 

Year K ui ii>' nth 

(thou .ari'H L‘i 

tlitl‘1 VOJlJlt.’U 

n.o\ ii))? 

rnov ;ng 

' 

‘jhort tons) 

i .1. . 2. J 

nv. niKo 

rragi' 





r:o} a 21 

CV-1. 2 Co). 4 

fl) 

02) 



(0 

(.U 

19U 






January . .... 

?2b 

7 




Fohiuaiv. 

■2t;S 

1 




Matvh ' . 

227 

j 




April 

2 i J 

3 




May . . 

2 If*'. 

H 




June 

22S 

i 




Jul> . . 

212 

;? 

5 U)S b/ 

225 1 

0 1 2 

Au trust 

217. 

1 

5 . 3 \ t 5 

222 7 

97.5 

Scptciu]>t I 

222 

7 

5,277 2 

210 0 

101 .3 

( )otol>»^r 

225 

5 

5.100 (i 

2ltv t> 

10S.7 

Novoir-l'. 

222 

3 

5, 106 5 

212 S 

104.5 

Docornhor 

2^S 

4 

3 

200 1 

104.4 

rti 1 






January 

PM 

7 

1 012, r> 

206 0 

94 5 

Fuhriiary 

j ;t' 

9 

{ .'^4 ! .5 

203 0 

86 8 

Man'll 

201 

r 

1 NOJ 7 

2l >t) 2 

JOO 7 

April 

2(fJ 

1 

2 

FKS 0 

101 6 

M'ly 

‘07 

4 

, 72.5 1 

101') 0 

lot ,*.3 

Junn 

I“1 

1 

1 702 0 

10.5 0 

07.5 

July 

17 } 

9 

•l.tiSo py 

105 0 

89 7 

Aut^usl 

1^2 

4 

1 (u>0 5 

101.6 

93.7 

iSuj){(‘iniirr . . 

i sM 

}> 

1 «')t,o 5 

lot 6 

97 4 

( )f*t,ob<‘r 

2H 

1 

1 ('72 7 

lot 7 

1:20 

^'()v^•lu]>f‘r ’ 

2U 

(> 

\ . ()'^3 2 

105 1 

108 5 

l)i‘Ci'n5)K'r 

2ot; 

0 

j L.»>01 0 

1 0 5 .5 

105 4 

1 U f Ti 1 






Jraiuarv . ‘ 

1 

o 

1 uo.’, 1 

105 •; 

9 1 7 

Fubni.irv ! 

‘75 

1 

l,71t. 9 

lot. 

SO 1 

Marrh * 

2‘ ‘2 

s 

4.7(;i 1 

1‘N 4 

102 2 

April 1 

20.; 

o 

4.S(^ 0 

200 2 

10 1 5 

May i 


S 

1 so; 9 

202 0 

101 9 

Juno . ' 

, iOO 


4 SOO S 

20.; 8 

93 5 

July .! 

177 

0 

i i,04i; iv 

206 i 

86 3 

Aut^ust 

2o2 

0 

I 5.030 1 

2t>0 6 

06 8 

8optonihf'r 

2];; 

3 

i 5. 1 13 1 

211,3 

99 5 

Ootohor 

2.;o 

9 

, 5,263 S 

210 3 

lOS 0 

Novo rubor 

! 22i() 

1 

1 5 .37.) o 

22 1 0 

105.4 

JJoooiulu'r 

1 225 

1 

\ 5,t09^ 

220 2 

98 3 

ivhk. 

1 





January .... 

1 221 

1 

1 5,1). 73 8 

23 1 7 

94 2 

February 

222 

‘2 

5,7 >;> 4 

.. J J 

o.> 1 

Maroh ... 

2i>7 

7 

5 soO 1 

214 2 

100 6 

April ... . 

250 

0 

: 5,,h-7 7 

21 s 7 

14)4.1 

May 

1 201 

h 

! (’) 07S 1 

253 3 

103 2 

Juno 

2.V2 

3 

6,20.; 2 

25S 5 

100.3 

July . . . 

! 

1 

6,317 'K 

2‘;.i 2 

02 4 

Auru.-I 

' 25 7 

3 

i 6.2,98 1 

266 . 6 

06.5 

Soptonilior 

j 2u.S 

('» 

6. MhS 6 

2t\9 5 

08,6 

October 

1 202 

2 

i 6,512.1 

272 6 

107.2 

Novorribor 

1 2‘Jl 

5 

i 6,622 4 

: 275 9 

105 7 

Doi’oinbor 

1 294 

S 

i 6,607 0 

1 279.0 

105 7 
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TABLE 14.5 (Continued) 


Year and month 

(1) 

Consumption 
(thousands of 
short tons) 

(2) 

l3>montb moving 
total weightad 

1, 2. 2, • • • , 2, 2. 1 

(3) 

Centered 12> 
month 
moving 
average 

Col. 3 + 24 
(4) 

Per cent of 
12-munth 
moving 
average 

Col. 2 + Col. 4 
(6) 

1947 





January 

2G6.4 

6,751.0 

281 3 

94.7 

February 

258.4 

6,795.4 

28:i 1 

91.3 

March 

302.7 

6,853.4 

285.6 

106.0 

April 

May 

297.6 

6,934 7 

288.9 

103.0 

303.0 

7,028.3 

202.8 

103.5 

June 

292.7 

7,102.1 

205.9 

08.9 

July 

263.7 

7,155.6^ 

7,220.6 

298.1 

88.5 

August 

281.1 

300.9 

93.4 

September 

299.8 

7,205.2 

304 0 

98.6 

October 

339.3 

7,375.9 

307.3 

110.4 

November 

338.0 

7,460.8 

311.1 

108.0 

December 

322.1 

7.517 0 

814.5 

102.4 

1948 





January 

292.5 

7,609 3 

317.1 

92.2 

February 

297.4 

7,670.1 

310.6 

03.1 

March 

338.3 

7,740.4 

322.5 

104.9 

April 

May 

342.6 

7,820.2 

325.8 

105.2 

348.8 

7,888.9 

328.7 

106.1 

June 

327 1 

7.950.8 

;«i..5 

98.7 

July 

291.6 

8,038 GV 

331.0 

87.1 

August 

314.0 

8,000.2 

337.1 

93.1 

September 

337:2 

8,130 2 

338 8 

99-5 

October 

381.7 

8,185.1 

341 0 

111 9 

November 

364.3 

8,254 8 

344 0 

105.9 

December 

363.7 

8,321 0 

346 7 

104 9 

1949 





January 

332.7' 

8,365.3 

318.6 

95.4 

February 

308.8 

8.390,8 

349 6 

88.3 

March . . : 

366.9 

8,414.1 

350 6 

104.6 

April 

May 

368.9 

8,451.0 

352.1 

104 8 

392.2 

8,482.9 

353.5 

110.9 

June 

349.9 

8,506 0 

354 4 

98.7 

July 

1 313.1 

8,627 2V 

355.3 

88.1 

August 

; 318.0 

8,564.0 

356.8 

89.1 

September 

356.6 

8,618 4 

359 1 

99.3 

October 

399.3 

8,683 3 

361.8 

110.4 

November 

378.0 

8,727.9 

363.7 ’ 

104.1 

December 

372.6 

8,764.2 

365 2 i 

102.0 

1950 




94.0 

January 

346.1 

8,814.6 

367.3 

February 

333.2 

8,867 0 

369.5 

90.2 

March 

a396.9 

8,913 1 

371.4 

106.9 

April 

May 

403.8 

8,951.9 ! 

373 0 

108.3 

401.9 

9,002.7 

376.1 

107.1 

June 

376.6 

9,067.8 

377 4 

99.8 

July 

336.8 

9,084. !>/ 

a378.6 

89.0 

August 

346.8 

9,088.0 

378.7 

91.6 

September 

373.8 

9,088.9 

378.7 

98.7 

October 

420.8 

9,093.3 

378.9 

111.1 

November 

407.9 

9,101.6 

379.2 

107.6 

December 

398.3 

9,091.6 

378.8 

105.1 
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Year and month 

(1) 

ConiiumptioD 
(thouflands of 
short tuns) 

(2) 

13> month moving 
total weighted 

1. 2, 2. • • - , 2, 2, 1 

(3) 

Centered 12- 
month 
moving 
average 

Col. 3+24 
(4) 

Per cent of 
12-moath 
moving 
average 
Col. 2 +Col. 
(6) 

1951 

January 

345.6 

9,077.0 

378.2 

91.4 

February 

336.6 

9,071.3 

378.0 

89.0 

March 

394.4 

9,076.6 

378.2 

104.3 

April 

410.7 

9,068 7 

377.9 

108.7 

Nlay 

403.2 

9,048 1 

377.0 

106.9 

June 

365.3 

9,032 5 

376.4 

97.1 

July 

333.4 

9,021 7v/ 

375.9 

88.7 

August 

344.6 

9,021.4 

376.9 

91.6 

September 

381.4 

9,026.3 

.376.1 

101.4 

October 

405.3 

9,014 0 

375.6 

107.9 

November 

402.8 

8,997.7 

374.9 

107.4 

December 

387.8 

9,013.2 

375.6 

103.2 

i9.':;2 

January 

345.3 

9,024.1 

376.0 

91.8 

February 

336.6 

9,017.5 

376.7 

89.6 

Marc^^ 

309 3 

9.012.9 

376 5 

106.3 

April 

393.5 

9,031.9 

370.3 

104.6 

May 

404.1 

9,066 8 

377.8 

107.0 

June 

379.9 

9,079.8 

378.3 

100.4 

July 

329 7 

9,085 W 

378 5 

1 87.1 

August 

341 6 

9,101.0 i 

,379.2 

90.1 

September 

379.7 

9,132.1 

380 5 

99.8 

October 

426.0 

9,169.2 

382.0 

111.5 

November 

417 0 

9,210.1 

383 8 

108.7 

December 

386.6 

9,236.9>/ 

j 384.9 

100.4 

1953 

January 

351.8 

i 


February 

3 16 0 

. . « 

' • • . 

. . . 

.March 

421 0 

... 1 

1 

* . . 

.'Vpril 

408 9 

. . . 



May 

429 6 



* . . 

June 

381 2 





Data fiom U H Orptu ttnrnt of ('otntiiorre, ffusinenB Sfalt.^tics, iyr>3 Birnnial Edition, p. 179; 1961 
Bionnial Edition, p f7S, and .Supplement to the Sun'cy oj Current BunxneaB, p. 160. 


from the preceding .January to the following January, inclusive. The 
total for Dccemher 1952 will include values from June 1952 through June 
1953. These value.s are entered in Column 3 of Table 14.5 and serve as 
check values for the moving totals to be obtained in step 2. 

2. Using an adding machine* which will subtract, enter the weighted 
moving total figure for July 1943. Subtract the values for January and 
February 1943, add the values for January and February 1944, and sub- 


• If an adding machine with a subtraction bar is not available, a calculating machine 
may be used. It is possible to subtract on an adding machine which has no sub- 
traction bar by adding the uomplomrnt of a number (for example, the complement of 
276 would be entered as 99999724 on an eight-column adding machine). However, 
adding complements is not recommended for use in step 2, as the operator is likely to 
make numerous mistakes. 
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total Thl> ‘siil>total is the wcii^htod moving total for August 1943. 

Next subtract th*' x^aluos for February and March 1913, 
loid Uu‘ values for Fehruarv ami Manli 1911, and 
,^iiht(ffal d'his se(a)nd subtotal is tlu' value for S('p- 
tenilx'r HM3. Contiiiue the ])rcH*essof subtracting two 
values, adding two values, and subtotaling, as shown in 
the accompanvuig repiodtiction of a portion of an 
adduig-machine lapcx When the subtotal is obtaimal 
I'oi July 1911, it sliould agree with the figure already 
oluaiTied. Agn'cinent is indicated for all of tlie July 
hgtires, and for December 1952, by check marks in 
Column 3 of Taldt* 14.5. 

3. Compute the centered mo\ ing average by divid- 
ing each figure in ( 'olumn 3 of J'abli' 11 5 }>y 24. JJus 
division may be accomplished most expeditiously by 
])lat‘ing the reciprocal of 21 i\shu9i is 0.0 1 lOhhO?) in 
tlie keyboard of a (‘alcuiating machine ami multiplying 
it by the values shown Iti Coluiriii 3 of J5ible 11.5. 
File rnaehine neial not ])e eh'ared between mullipliea- 
tions, slTj(*e if is merely n(‘(‘c-'sa»y to incrc^ase or tie- 
crease tlie muklplitu’ to obtain the next produi't. If a 
I a]«’iiiating maclune having automatic nui'tiplitaalion 
is 'ociiig usovl. it nill i)robaJ>ly b(‘ ])r('t('rabl(^ to rlcar 
out th(' reM?lt of cai*h multiplif'atinn bt'fon* pr()c(‘<aliug 
l(; the next one; 00lir>0f'>ro should be rctauK'd in the 
macliine for all of the mult iplieatuins. File l(^sults are 
>hov\ri in Column 1 f»f 'Tabli' M.5. 

'idle m.xt ste{) m eompiiting t he sfviisonal inde.x eon- 
sists of expressing eaeh origuiai value as a ptu-eontage 
of (lie (a;rrx'sfjonduig rentcred mtiving averagt!. 'Thf; 
re.-Full.^ of thi:^ step are.diown in Column 5 of ddablt' 1 1.5 
and in ( hart 11.0. Fhe logic of tin, procedure is as 
follows:: ddrne st'ries are assumed to b(‘ eonipcjsed 
of T X C X S X / fldeml X ( -yeh^ X Scaisoiial X Ir- 
K'gulaCo File l2-rnonth moving average is a rough 
r.>tirnat(' of 1 X (' beeause the r2-rrjonth avt'rage 
sinootlies out seasonal inoverm i'l s and, for the most 
|)ait, irregular movenumls, sinei^ the latter artMargiJy 
movements of small amplilinie and short duration. 
If now we divide tlie original data b}' the 12-month 
moving average^ we have an estimate of tlie seasonal 
and irregular mover lents combined: 


5,40a.40 

226.70 - 
2C3.iO - 

194.70 

176.10 

5 f 344 - in 

2CS.10 - 
?3;.io - 
176. :o 

5,0V '.:_o 3 

JiV i>" - 

243 .jO - 

I’ol.io 

5,199.60 S 
- 

243. JO - 

201.10 
J9V4 ) 

5,''1^.50 5 

24 3.30 - 
22(5.40 - 

197.40 

vn . 10 

5,013.30 C> 
UM.40 - 
212.3'J - 
191.1'' 
1V4 .90 
4,^43.6C S 

212.30 - 
2t7.10 - 
174.90 

182.40 
4,371.5': 3 

2X7. n - 

222 .70 - 

4,3:3.'V > 
/: . n - 

235 50 - 

139.40 

4,7: 3. V 3 .: 


2H Ij 
4.725 15 

22.'. 50 .. 
.V'i.l j 

206.00 
4, VC?. 00 3 
213.40 - 
194.70 - 

/.0'6 .0'l 
185.20 
4,660.10 3 
194.70 - 

176.20 - 

135.20 
175.10 

4,f-^n9.50 S 
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T XC X >S X 1 

TXC 


S X L 


Chart; M.O ^how.s (juito f-learly the presence of I fie s(iasQnal inovcnioiit, 
whieh seorn^ to ])e afiproximalely the name from year to year. It not 
exactly the same, since -the spring peak is sometimes .Xfarch, sometimes 
>V[jril, ami sonietimes May; also, the fail peak occurs in (kaober, but 
occasionally NOvinnber is almo'.t as Iiigli. 

From this point on. the jirocedure parallels that used for the library- 
circulalnm data expn;ss(al in f)(‘rcentage, tfarns. First, hou e\ (‘r, we make 
d'ahlo 1 l.(), wliicli puts the p(‘r-c(ud 'Ot-movine-ji\ emg(‘ <lala into a form 


‘‘IH cr.*iT 



(>harl it. 6 . PortM‘nttm<*s <»r Tueiv*‘-.\T<»nlh Moviiijj; V^orapje for 

CoiiMiiniplioii of Ncuf^prinl l>> rnilt*<l Slates Publishers, 19 It 1952 . Data of 
Tahlt- 1 1 5 or 1 I.U. 


whiclt facilitates the construction of the arrays, which are shown in Table 
M.7. Notice tliat only iho.'-e years for whi(‘h 12 per-cent-of-inoving- 
averag(i figures were avaiilahle are li.^ bnh'd in T<o ' *s 1 l.d and 14.7. 

After making a tal>li' of the monthly arrays, a cli. t, such as CUiart 1 1.7, 
should be constructed. A chart of tlje monthly arrays is often useful in 
helping one to decide w'hat measure of cential tendency to use in averag- 
ing the months; in addition, it gives a gtuieral indication of the seasonal 
pattern. 

There are Isvo w'ays of deci<ling what items to eliminate. One waiy is 
to consider each array (jf Chart 14.7 sejiarately and lO eliminate items that 
appear to be unusually higli or low, perhaps studying each large deviation 
individually and eliminatiiig tliose for whi(‘h a .'){>ecial circumstance (am 
be discovered. If this im'lhod is followed, one an.^y might use an 
average of all items, another migi. tunploy the median; a third, the 
central five items; a fourth, all it(*ms except tlie two highest; and so on. 
On account of the exlivmo subioctivity of the method, it is dangerous 
unless the statistician possess(‘s a high order of knowledge and judgment. 
An alternative method, which is probably more frequently used, consists 
of computing t): ■ same typo of modified mean for each month. Xo 
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geiieraUy applicable rule can be set up for the selection of the appropriate 
modified mean, but the exclusion of the one highest value and one lowest 
value or the two highest and the two lowest values will often be found to 
be satisfactory. The number of items to exclude depends partly on the 
number of cycles included in a series; the larger the number of cyclical 
highs and lows which are reflected in the percentage's of moving average 
(because they have not been completely smoothed out by the moving 
averagO. the more i xtreme items which may need to be cx<*liided. For 
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Chart 14.7, Arrnye<l IVrrentJifiC** of .Moving \\eraf 4 r and Srasoiinl 
likdeic for CoriKij mption «>f Ne^Hpriiil b\ CnittMt Stulcn Piiblishcra, 
1944-'Ji?52. Data of TaMo 14.7. The lii^^hest and lowest value in each 
array wa.s ext'lud^'d ff)r i>urpos(\s of roinputjng; the si^asonal index. 


the newsfirint consumption data of 'Table 14.7, we have used the mean of 
the middle seven values, with the results shown in the next-to-the-last 
row of the table. 

The 12 modified means average 99,8. When each modified mean is 
divulg'd by 99.8 and multi})lied by 100, we get the seasonal index^ shown 
in the last row of Table 14.7 and in Chart 14.7. Note that the 12 values 
of the seasonal index average 100. 0. This is important, since seasonal 
variations will later be removed from the original data by dividing the 
original data by the seasonal index. If the seasonal index were to average 
less than 100.0. the adjusted figures would all bo a little too large; if the 

•A BcaHonfll jnd»‘x based on the rruiin of the middle five itoin.H in Tabic 14.7 is so 
nearly the sarn<5 thnt tlie eurv#' rould h.ardly be distingiii.shcd from that shown in 
Chart 14,7. The j^reatest difTereiu ^* for 1 ny one month is O.'d. The same is true for 
an index based on monthly medians, e.vcept that one month (May) shows a difference 
of 0.8. 
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seasonal index were to average more than 100.0, the adjusted figures 
would all be slightly too small. 

Link relatives. At one time the link-relative method was the most 
widely used method of obtaining a seasonal index. The computations 
involved are less extensive than those required by the moving-average 
method, but the link-relative method is less satisfactory than the 
moving-average method; in particular, it is not readily adaptable to the 
determination of changing seasonal movements, a topic treated in the 
following chapter. 

The first step in this method consists of expressing each monthly value 
as a percentage of the preceding monthly value. Those are the link rela- 
tives. From this point on, the procedure® is the same as shown in Table 
14.7, except that the 12 monthly averages are gcr orally found to contain 
some residual trend, which was not eliminated by computing the link 
relatives. Adjustment for this residual trend must he made before the 
seasonal is obtained. 

ADEQUACY OF THE SEASONAL INDEX 

One test of a seasonal index is provided by the chart of the arrays, as 
shown in (4jar^ 14.7. If the individual arrays are w'idei}?' dispersed (that 
is, cover a wide range veitically), we can have little confidern.-e in the 
seasonal index. l"he less tlie dispersion of the individual monthly 
arrays, the more uniform is the seasonal movement from 3 ^ear to year. 

It is possible to ascertain (by the method described in Chapter 24) 
whether a given modified mean differs significant\v from 100. Or, using 
the method of analysis of ^^ariance (discussed in C}viMter2fi), to ascertain 
whether the 12 modified means as a group differ significantly from each 
other. However, these procedures are of diibiou.s value, primarily because 
the distributions from which the means v’cre computed were not random 
distributions, and also because the means were modified means, computed 
after part of the data had been rejected. 

A practical test of the adoipiacy of a seasonal hidex is to use it to 
eliminate the seasonal variation in the series, and then to observe whether 
eLwy residual seasonal movements are present. We shall return to this 
point in CTiapter 10. 

“ The method is more fully described on pp. 4ftG- 492 of the first edition of this text. 
The advantages and disjxd vantages of the link-relative method are set forth there in 
more detail. 



CHAPTER 15 


Analysis of Time Series: 

PERIODIC MOVEMENTS II— CHANGING 
SEASONAL PATTERNS 


In Chapter 14 we considered procedures for determining seasonal 
indexes for series having patterns which underwent little or no change 
during the period with which we were concerned. Some time series have 
seasonal patterns which change. Changes may be progressive — which is 
to say that the seasonal pattern varies gradually from year to year — or 
they may be of a more abrupt nature, reflecting, for example, changes 
in the date of Easter or the shifting date of some important event, such 
as the New York automobile show in the fall of 1935, which was men- 
tioned in Chapter 11. 

PROGRESSIVE CHANGES IN SEASONAL PATTERN 

A moving seasonal. Chart 15.1 shows monthly data of the linage 
of magazine advertising in the United States from July 1942 to June 1953. 
As will be clear later, this series has a progressive change in seasonal 
pattern: the pattern is not the same throughout the period with which we 
are concerned. This is often referred to as a moving seasonal. From a 
chart such as Chart 15.1, it is not always possible to ascertain whether 
the seasonal pattern is fixed or moving. To make this decision, it is 
usually necessary to proceed part way with the seasonal analysis (through 
step 2 of the procedure which follows); luckily, the initial steps arc the 
same for the determination of either a constant or a moving seasonal. 

Computation of a moving seasonal index, A moving seasonal 
index may be obtained as follows: 

1. Compute a centered l2-mo*iith moving average of the original data. 
Since the procedure is exactly the same as shown in Columns 2, 3, and 4 
of Table 14.5 for the data of newtorint consumption, the computation 
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of the moving average is not shown here. However, the moving average 
is shown graphically in Chart 15.1. 

2. Express the original data as percentages of the moving average 
These figures are shown in Table 15.1. 

3. Plot the data of Table 15.1 on 12 charts, one chart for each mouthy as 
shown in the 12 parts of Chart 15.2. These 12 monthly charts may be 
drawn on separate sheets of graph paper or on one large sheet, as may be 
convenient. In any event, they should not he too small in view of the 
use which is to be made of them in the next two steps. 



1942 1944 1940 1949 1947 (949 1049 1900 (991 1992 >909 


(^hari 15.1. Mu^n/.ine \<ivorLiHing in ihe l^niloil Slates, July 1942-Juno 
19.53, and Twelve-Monlh-CVnlt^red Moving Average, January 19‘13--Deccniber 
1952. Data from vtinouH iriHUOH of the Sur.( }f of Curreni ' sinew. Moving average 
roinputcd as shown in TabK' 14.6. 


4. Reference to the Jamuiry portion of Chart 15 2 shows that January 
has a downward trend. June, July, August, November, and December 
also have downward trends. Several months show upward treiids, for 
example, March, April, September, and October. The monthly trends 
may be linear or non-linear. Also (although Chprt 15.2 does not show 
a good example of this) a month may have a trend which declines and then 
rises, or vice versa. The fourth step consists of detcrinining a trend for 
each of the 12 monthly charts. This may he done by drawing freehand 
trend lines, by fitting mathematical urves, or by using a moving average 
(for example, a five-term moving average) as a guide and smoothing the 
moving average freehand. However the trend lines arc obtained, 
they should be relatively simple curves and should not slope too steeply, 
up or down, at the ends. It must be realized tliat the ti'cuds wc are coi)- 
cenied with here are not affected by the same forces that are associated 
with secular treno, The monthly trends are very unlikely to continue in 
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Chari 15.2. Monthly Charts »<> Assist in Dolerniina- 
tion of Moving Seasonal Imlex for Man^ay.ino A<l\erlising 
in the UiiiteJ States, 1943 -1952. Duta from Table 15.1. 
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( 9»2 



1949 1952 


To avoid obscuring drtajifl, thoHO chnrtH «}jow no guide linos. 
When used to aid in the romputation of a moving seasonal index, 
charts such as tliese would have finely ruled grids. The values 
in Table 16.2 are read from the smooth curves. The values in 
Table 15.3 are the* dots which aie just above, just below, or on 
the smooth curves. 
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a given direction indefinitely, but are more likely to move to a certain 
level and then remain more or less stable until new factors bring about a 
change from that level. For purposes of illustration, the 12 trend lines 
in Chart 15,2 were drawn freehand. If we wish to have a seasonal index 
for a year later than that sliown in a chart such as Chart 15.2, in order to 
dcseasonalize the monthly data as the^’^ become available, we may use the 
seasonal index for the last year shown (as is done in Table lb. 3) or we 
may extend the monthly trend lines. 

5. From the monthly charts of (Uiart 15.2, read the trend values and 
enter them in a table. These, are first approximations of the moving 
seasonal and are shown in Table 15.2. 

PER CENT 



I'.hart 15. .'i. Moving Seasonal Iioiex for Magazine Advertising in the United 
Slates, 1943 1952. Data from Table 15 3. 


f). It will be noticcvl that the 12 values for oa^ !. year, shown in Tal)le 
15.2, in no instance total 1 ,200 0. The final step v onsists of adjusting the 
first approximation figures of 'fable 15.2 so that each annual total will be 

1200.0, but at the same time retaining .smooth, well-fitting trends for the 
12 parts of Chart 15.2. The results of this step are shown in Chart 15.2 
by means of dots and in Table 15.3, which gives the moving seasonal 
index. Note that the total for each year is now 1,200.0. If the 12 
monthly trend lines are linear, they may he fitted matliematieally by a 
procedure’ which automatically results in the annual totals eac^h being 

1200.0. 

The moving seasonal pattern for magazine a^b erlising is shown 
graphically in C'hart 15.3. Note how the relative importance of March 
and April changes over the period; note also that the mid-year low, tvhich 
was June in 1943, gradually shifts to July. Another interesting point 

^ See R. J. Foote and Karl A. Fox, Seasonal VariatioJi: Methods of MeasiiremerU and 
Tests of Significar: e, pp. 6-7, issued September 1952 by the Bureau of Agricultural 
Kcouomics as Agricultural Handbook No. 44?. 
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brought out clearly in Chart 15.3 is the gradual change in the amplitude 
of the seasonal variation over the period. 

The reader may have noted that steps 4 and G in the determination 
of a moving seasonal index may involve subjective considerations. This 
does not constitute a weakness in the procedure, but it does suggest that 
better results are more likel}'' to he obtained by an expeiienccd worker 
who is familiar with the series being studied than by one not so well 
equipped. The pn«f"edure for obtaining a moving seasonal index, which 
has been described in the preceding paragraphs, is occasionally modified 
by using a 12'month moving average, not centered, but ar])iiranly placed 
opposite the seventh (or sixth) month. 

If a series that contains a moving seasonal is deseasonalizcd by a (con- 
stant seasonal index, the adjusted data will contain not only the irregular 
movements actually present in the series, but additional irregularities 
where the constant seasonal index has undercorrected or ovcrcorre(‘ted. 
Unless one knows that the series with which he is workirig has a fixed 
seasonal movement, it is always wise to make the 12 monthly charts of 
Chart 15.2, These will reveal whether a moving seasonal is present; if 
the seasonal is constant, the trends will be horizontal liiuis. 

Footnote 5 of (Chapter 14 pointed out that a 12-month moving average 
may not move high enough into cyclical peaks or low enough into cy(‘lical 
troughs. Partly to correct for this (*haracteristic of the moving average, 
the Division of Research and Statist i(\s of the Board of Governors of the 
Federal Reserve System uses a more complex procedunr than the one 
just illustrated. Here are the bare outlines of the Federal Reserve 
method: 

The main nonseanojinl movements are determined as follows 

1. Compute a 12-moiith moving average centered at the seventh 
month. 

2. Plot the original data and moving average on an arithmetic! grid. 

3. Draw a freehand curve \hrough the curve of the oiiginal data, 
wherever the moving-average curve seems to fail adequately to 
de.scnbc the main nonseasonal movements. 

4. Read and record the monthly values from the nioving-average <*urve 
as rnofhfied. 

Typical differences between the unadjusted values and the main non- 
aeasoruil movements are next obUiined - 

5. Express the. original values a.s percentages of the values of the main 
nonseasonal seiics obtained in step 4. 

*For a full description, see ^'AdjuHtmen: for Seasonal Variation,” by H. C. Barton, 
Federal Reserve Bulletin, June 1941, pp. 518 -528. 
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6. Make 12 monthly charts (one each for January, February, March, 
and so on) of the ratios obtained in step 5. 

7. Draw a freeliand trend line for each monthly chart. This is termed 
^'averaging the ratios for each month.” 

8. Read the values from the freehand lines of step 7 and adjust the 12 
values for each calendar year so that they total 1200 or “depart 
from this total by no more than an amount that can be accounted for 
by some special circumstance affecting the series.” These are the 
preliminary seasonal indexes. 

9. Using the figures of step 8, compute a ]ireliininarv scries adjusted for 
seasonal variation. 

The prelirnuiarij index is then revised — 

10. Plot the prcliminar 3 ^ adjusted series on the c.iart of step 2. 

11. Repeat steps 3 through 10 for all lo(\‘itions where the original free- 
''and curve departs from the general movements of the preliminary 
adjusted series. This j)rocedurc lesidts in revised preliminary sea- 
sonal indexes and a revised preliminary adjusted series. 

12. l^lot the revised preliminary adjusUul series on a year-over-year 
chart similai to Cluart 14.5. 

13. Examine tlie yoar-o\'er-voar cliart by reading it vertically to see 
wliethcr there are months (or gre aps of months) showing recurring 
niovemouts of a seasonal nature. 

14. Make a final levision of the seasonal values of step 1 1 (modifying the 
curves of step 7) to eliminate, as far as possible, all recurring move- 
ments, shown in the year-over-year chart, that seem to be seasonal in 
nature. The 12 valu(‘s for ea<‘h calendar vear should ordinarily con- 
tinue to total 1 200. A final chc- % of tlie desva. '-nalized data may be 
mad(‘ on a yoai -over-year chart. 

It must be clear that the Federal Reserve procedure differs in two 
respects from the method used in this text: first, the moving average 
(which is not centered) is modified by a freehand curve; and second, the 
seasonal index first obtained (step 8) is twice revised. This method 
reQuires knowledge of the field represented by tin data and a high order 
of judgment. In the words of the article mentioned in footnote 2, it 
requires a higher grade of work and somc'vhat more time than most 
mechanical methods.” For the less erraitc series, it was found that 
determining and eliminating scasoi 1 for data covering a 14-year period 
required about a half-day’s w^oik of a profes.sional nature and two days of 
clerical work. The author of the article adds: ‘^Time spent in this way, 
however, yields more accurate seasonal adjustments than can be obtained 
by applying an inflexible mathematical process, and in addition yields a 
knowledge of otl'cr characteristics of the underlying scries that is valuable 
on its owti account/' 
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SUDDEN VARIATIONS IN SEASONAL PATTERNS 

Seasonal patterns may change abruptly, rather than gradually, and 
then the device of a moving seasonal would be inapplicable. Such 
changes may involve merely the relative importance of two consecutive 
months, or may involve a change in the entire pattern. The most fre- 
quently encountered change of the first type is that occasioned by the 
varying data of Easier. 

Adjustment for Easter.* A number of statistical series are affected 
materially by changes in the date of Easter, which may range from 
March 22 to April 25. Retail sales and money in circulation are two of 
the series so affected. Department store sales, in particular, show the 
effects of the customary apparel purchases before Easter. A late Easter 
will tend to make April sales heavy relative to March, and, within limits, 
the later in April that Easter occurs, the greater is this tendency. On the 
other hand, when Easter occurs in March, March sales and possibly 
February sales will be increased. 

A procedure used by the Federal Reserve System for making Easter 
adjustments in the department store sales series is as follows: 

1. Compute prelimuiary seasonal adjustment factors.^ These should 
eliminate, so far as possible, seasonal fluctuations other than those caused 
by changes in the date of Easter. If a moving seasonal has been com- 
puted, the factors will vary from year to year, as shown in Columns 3 
and 6 of Taldc 15.4. 

2. Using these factors, compute smsonally adjusted index numbers for 
March and April of each year. These are shown in (Vdumns 4 and 7 of 
Table 15.4. 

3. Next, compute the percentage change from Mareh to April in these 
preliminary seasonally adjusted indexes, The.Mo ('hange.s, which are shown 
in Column 8 of Table 15.4 do not, however, rell(M‘t solely the influence of 
Easter, but also the general trend of the series in the course of cyclical, 
secular, or short-term movements. Therefore, it is nc(‘e.ssary to adjust 
ftirther for .short-term trend.s. 

4. Derive approximate adjushn cats for short-term trend. If the method 
of seasonal adjustrncrit used is that de.scribed in the June 1941 Federal 
Reserve Bulletin, the March and April figures for each year can be read 

* This sertion \v?ih y)n*pjir(Ml iiiitiully by Rnlxirt K. lifwi.s, forinorly economist with 
the IVflertil Il(wrv<* Uui.k of New York iind now ironoiriiMl with the National C^ty 
Bank of New York. The procciiure is taken irj ))art from pp. 1472-1473 of tlie 
December 1951 Fcdrral Rcncrm: FiulUttn I'he cxnrnph'.s shown are base-d on the 
experience of the Federal HoKervo Bank of New York. 

* In this instance, the [)ro(.cdnio ii.sed wa^’ that described on tin* preceding pages 
and referred to as the Federal Reserve rnctliod. 
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TABLE 15.4 

Computation of March-April Percentages of Change in Preliminary Season 
ally Afijusted Indexes of Department Store Sales in the Second Federal 
Reserve District^ 19I9-1'953 



~ 

March 



April 



Year 

Un- 

adjusted 

Chang- 

Pre- ' 
lini inary 
seasonally 

Un- 

adjusted 

Chang- 

Pre- 

liminary 

seasonally 

Per cent 
change 
Mar. to 

index* 

ing 

adjusted 

index* 

ing 

adjusted 

Apr 


1947-49 

sea- 

sonal 

index * 

1947 49 

sea- 

sonal 

index* 

Col. (7) 


= 100 

Col. (2) - 

= 100 

Col. (5) ^ 

Col. (4) 


(2) 

(3)_^ 

Col. (3) 
(4) 

(.5) 

(0) 

Col. (6) 
(7) 

C8) 

1919 

27 2 

95 

* 28 0 "^ 

32.9 

100 ' 

32 9 

+ 15 0 

20 

39 2 

95 

41 3 

39 0 


39 0 

- 5 0 

21 

37 9 

94 

40 3 

38 9 

100 

38 9 

- 3 5 . 

22 

35 2 

92 

38 3 

41 1 

100 

41 1 

-f 7 3 

2:1 


91 

43 2 

42 0 

100 

42 0 

- 2 8 

24 

38 8 

90 

43 1 

44 0 

100 

44 0 

4-3 5 

25 

40 9 

89 

40 0 

45 7 

99 

40 2 

+ 04 

20 

42 0 

89 

47 2 

15 5 

98 

40 1 

- 1 7 

27 

42 2 

89 

17 4 

49 2 

98 

50 2 

+ 59 

28 

43 1 

89 

48 4 

47 ] 

98 

18 1 

- 0 0 

29 

48 0 

89 

54 0 

1‘ 3 

98 

49 3 

- 9 7 

1930 

45 5 

89 

51 1 

53 1 

98 

51 2 

+ 0 1 

31 

45 i 

89 

50 7 

49 1 

98 

50 1 

- 1 2 

32 

35 3 

89 

i 39 7 

38 3 1 

98 

39 1 ; 

- 1 5 

33 

27 9 

89 

31 3 i 

35 9 j 

98 j 

30 0 j 

+ 16 9 

34 

30 8 i 

89 

41 3 

35 9 1 

98 1 

36 0 

-114 

35 

33 0 

89 

37 1 ! 

30 0 1 

98 ; 

37 3 i 

+ 05 

30 1 

35 4 

89 

39 8 ; 

.ij 0 i 

98 , 

39 8 ; 

0 

37 

39 3 

89 

I 44 2 : 

40 7 i 

98 

41 5 ; 

- 0 1 

38 

3 I 0 

87 

39 8 ! 

40 7 j 

98 j 

41 5 1 

f 4 3 

39 

35 0 

87 

10 9 1 

40 1 

98 ' 

40 9 1 

0 

1910 1 

37 0 

87 

42 5 ! 

38 5 ; 

98 j 

39 3 j 

- 7 5 

41 

39 2 

87 

15 1 . 

40 5 ; 

98 1 

47 4 1 

+ 5 1 

42 

IS 4 

90 

53 8 ; 

19 3 ' 

97 ; 

50 8 ! 

-'5 0 

43 

47.2 

94 

50 2 ; 

53 3 

97 j 

54 9 

+ 94 

44 

57 0 

98 

58 2 ' 

50 4 1 

90 I 

58 8 I 

+ 1 0 

45 

72 4 

99 

73 1 : 

58 9 ; 

90 

01 4 ' 

-10 0 

40 

85 2 ; 

99 

80 1 i 

90 5 1 

90 

94 3 i 

+ 95 

47 

94 7 

98 

90 0 1 

92 5 

90 

90 4 1 

- 0 2 

48 

97 3 

95 

102 1 

98 0 

90 

102 7 

+ 0 3 

49 

80 4 

89 

97 1 

99 3 

90 

l(-3 4 

+ 05 

1950 

80 7 

89 

97 4 

'3 9 

90 

d 7 8 

+ 04 

51 

94 8 

89 

100 5 

' 9o 0 

90 

99 0 

— 0 5 

52 1 

87.9. 1 

89 

98 8 

' 90 5 

90 

100 5 

+ 1.7 

53 1 

93 3 1 

89 

101 8 

95 5 

90 

99 5 

- 5.1 


• While department atoie jndexo*^ are not ordinarily published to one decimal place, an exception has 
tieen made in this case in order to a\oid distortion due to rounding in the comparisons for the early 

years. 

Data from Federal Re9e'-''c Bank of New York. 
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TABLE 15.5 

Determination of ISet Easier Changes Jor Department Store Sales in the 
Second Federal Reserve District, 1919 1953 


Deteiminjilion of ahori-ierm 


i 


liTrnl* I 

Per cent change, 
March to April, 
from Table 15.4 

Net f4isir;r 


r 

Voar 1 

1 

Maroh 

' 

April 

* IVr ('oiit short- ! 

1 tonn ohanRe, : 
Marrli to April < 

changes 
Col (5) 

Col. (4) 

J)nto of 
Ea^ster 




iCol. t3)*rCoL (2) i 




(1) 

<2‘ 

(H) 

: 

__ 

(O' 

(7) 

1919 ' 

Yo’y' 

31 7 

+2 0 ‘ 

-f 15 0 

412 4 

April 20 

20 1 

40 9 

41 1 

1 ^-0 5 ■ 

-- r> 0 

- () I 

Al)ril 4 

21 

39 ^ 

39 0 

; -0 5 i 

*- 3 5 

- .1 0 

March 27 

22 ' 

39 1 

39 1 

i 9 0 S 

-f’ 7 3 

4' i' 5 

April 10 

23 ; 

42 2 

42 5 

+0 7 

- 2 S 

3 5 

\pnl 1 

24 : 

44 3 

41 3 

0 1 

4“ 3 5 

■j' 3 5 

April 20 

25 

40 2 

lo 3 

4-0 2 1 

4- 0 1 

4 0 2 

April 12 

20 

48 H 

IS 3 

0 ! 

- I 7 

- 1 7 

Ajinl 4 

27 

49 1 

19 0 

' -0 2 j 

-f 5 9 

4 0 ! 

April 17 

28 ' 

49 0 

49 

0 : 

- 0 0 

- 0 0 

April 8 

29 ' 

52 0 

52 2 

-f 0 1 

- 9 ; 

]l) 1 

March 3 1 

19H0 i 

52 .s 

52 7 

--0 2 

4' i ) 1 

1 0 3 

\prii 20 

HI , 

49 0 

01 

0 

- 1 2 

1 2 

Aj>ul 5 

32 

39 1 

.IS * 

- 2 0 

- i 5 

-i- ii 5 

.March 27 

33 1 

33 S 

n 1 

rO •) I 

4-l‘l 9 

Ml) 0 

Apnl lb 

3 \ ; 

30 S 

3tj s 

’ 0 

-114 

11 1 

\pi»I 1 

H5 : 

37 2 

37 3 

, -f 0 3 

4. 0 5 

i 4 0 2 

\pril ‘21 

HO , 

39 9 

' 10 2 

rt) S 

0 

j -OS 

April 12 

37 

13 5 

43 1) 

4-0 2 

- »■) 1 

- 0 3 

Mar(4i 2S 


11 9 

' U) (» 

: -10 

1 3 

-r 5 .{ 

' April 17 


111 ‘j 

10 1 

rO 5 

0 

• 0 5 

1 April 9 

1911) 

12 2 

12 3 

tO 2 

7 5 

- ^ 7 

' March 21 

u 

ih li 

47 0 

ft) 9 

f 5 1 

4- 1 2 

: April l.'i 

-12 

51 2 

51 1 

’ 0 A 

- 5 r. 

- 0 0 

' .\]>ril 5 


51 ( 

51 

] 4-0 A 

4 9 4 

■0 9 0 

April 25 

\ \ 

5^ 4 

' 59 0 

f ! 0 

4- 1 0 

0 

; April 0 


Ith 

, in 2 

f-0 ). 

-lf» 1) 

10 b 

1 April 1 

-M) 

S'. 5 

' .S'l 0 

4 2 0 ! 

9 5 

t 0 0 

1 Ajinl 21 

1 i 

m 7 i 

, “S 0 

-- 0 o 

- 0 2 

-- 1 1 

1 Apnl 0 

-iS 

ii'.: ! 

!02 s 

1 0 1 

4- 0 

- 0 1 

! March 2S 

r, 

M i () 

9') 1 

■fi S 

4 4 5 

. 4 7 3 

1 Apnl 1/ 

I93i) 

7 

97 

4 0 9 

4' 0 1 

, - 0 5 

i April 9 

ol 

107 0 

' l<»b s 

- )) 2 

— 1) 5 

1 -- 6 3 

. March 25 

52 

100 5 

OH) 3 

-02 

4- ! r 

4 1 9 

1 April 13 

53 

102 1 

102 A 

0 

-- 5 1 

, - 5 1 

1 April 5 

♦ VuL 

. - II! ' 1., 

K M 1 J ,1 

'll ««Mi‘ ft , Ilf fut'n ■! 1 )iurt .‘xplaUi'Ml in hli'p 1 



DafA m 'ihiisins .'i hihJ 4 Ki'aptu' IJank (»l Nfw York, Figures i<i (.'oimnn 5 from 

TaMt loA. 



Chap. 15] 


CHANGING SEASONAL PATTEUNS 


355 


from the chart of the revised freehand curve. Alternatively, the March 
and April figures can he road from a freehand curve drawn through a chart 
of the preliminary seasonally adjusted series. Percentage changes 
between these March and April figures are then computed to give a rough 
measure of the montli-to-month movement attributable to short-term 
trend. See Columns 2, 3, and 4 in Table 15.5. 

5. Obtain net Easter changes by subtracting algebraically the short-term 
trend percentages from the original March- April changes computed in 



(^.hart 1.5. i. Date of Ka>lc*r atnl >el Ka.Ntor ('.hang*' for Department- 
Store Sales ill the Seeoml Keileral Ite^erve DisIriiU, 191*) 19o.'l. ironi 

Table 15 5. 

step 5’. In other words, the original changes are lowered slighUy when 
the general movement or trend of the seasonally adjusted index during 
the first half of the year is upward, and iho,y are raised slight ly when the 
general movement is downward. 1'hese net Easter ch inges are shown 
in Column 6 of Table 15.5. 

6. To confirm that these net Easter changes actually do vary in 
accordance with the date of Easter, we have plotted, year by year, these 
changes and the date of l^^aster. (See Chart 15.4, which uses data Irom 
Table 15.5.) It is apparent that there is a marked tendency for Vpril to 
show a greater per. outage inen^ase over Marcli when Easter is late and a 
smaller incTease or a decline when h.astiM* is early. However, this chart 
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does not tell us how much on the average April sales are increased over 
those of March for each additional day later that Easter occurs. Such 
an estimate can be obtained by plotting the net Easter changes, not by 
years, hut with the Easter date along the horizontal axis, as in Chart 15.6. 

7. Fit a freehand trend line to the data shown in Chart 16.5. The esti- 
mating line may be fitted mathematically if desired, but it would seem 
preferable to be able to discount those years when unusual factors 



Chart 15.5. Net Easier (>haiiK^ in Relation to Dale of Easter aiul Cpraphic 
Edtimate of Gross Easier (.’orrection Factor for Department-Store Sales in 
the Second E'edcral Reserve District, 1919-1955. Data from Table 15.5 The 
curve serves as a guide for determining the stepped line, from which the gross cor- 
rection factors are read. These are then entered in (^olimin 2 of Talde 15.0. 

affected data for March and April. (For department stores, sales were 
reduced in March 1933 because of the bank holiday and in April 1945 
because many stores were closed at the time of President Roosevelt’s 
death.) 

It should be noted that this line is horizontal throughout March. If 
Easter occurs at any time from March 22 through April 1 , no pre-Easter 
sales will be made in April, no matter when in this period Easter occurs. 
Conceivably, a very early Easter could mean increased J'ebmary sales 
relative to March, but the difference is not ordinarily great enough to 
warrant a special adjustment. 
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TABLE 15.6 


Easter Correction Factors for Department Store 
Sales in the Second Federal Reserve District 


_)afo of 

Gros.s 

correction 

I Net correction 

1 factor for 

JVet correction 
factor for 

Lastor 

factor 

1 March 

April 

(1) 

• (2) 

! (8; 

(4) 

March 

22 

- 6 

' 4-3 

-3 

23 

-0 

i 

-3 

24 

-0 

i -43 

-3 

25 

--0 

! 43 

-3 

20 

' G 

1 +5 

-3 

27 

-0 

i 

-3 

28 

-G 

+3 

-3 

2<) 

-6 

+3 

-~3 

30 

-6 

i 

-3 

31 

-G 

; 

-3 

April 

1 

»G 

' 43 

-.3 

2 

-G 

43 

—3 

3 

-G 

+3 

—3 

•\ 


-t-3 

-3 

5 


2 

: -2 

() 

: -4 

42 

, “2 

7 

i 

h2 

, “2 

8 

-2 

i -fi i 

! “1 

9 

-2 

43 1 

1 -1 

10 

I 0 

: 0 

1 0 

11 

1 0 

i 0 

0 

12 

! 

1 0 

1 0 

13 

' -{-2 

. -1 

i 41 

14 

! +2 

: -3 

41 

15 

I 4-4 

-2 ! 

1 4-2 

If) 

4 4 

~2 1 

42 

17 ' 

4G 

-3 

4-3 

IS i 

-hG 

-3 

' 43 

19 

1 -+*6 

i -3 

■43 

20 

i -t'S 

' -4 

44 

21 

1 4-8 

-4 1 

44 

22 

' 4 S 

-4 ; 

41 

23 i 

4 8 

--4 ! 

44 

24 ; 

I 48 

I -4 

1 

25 i 

48 

i -4 j 

1 44 


The data of Column 2 were rea* om Chart 15.5. 


8. Read off the gross correction factor for each date of Easier from the 
trend line to the nearest even number. These figures are shown in 
Column 2 of Table 15.6. 

9. Divide the gross correction factor by two to obtain the net correction 
factors, April sales gain by a late Easter what March sales lose, and vice 
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TABLE n.l 


Adjustment of March and April Seasonal Index Numbers of Department 
Store Sales in the Second Federal Reserve District for Variation in the 
Date of Easter, 1919-1956 




1 


1 March seasonal 

April seasonal 

A’ 

Date of j 

Xot correc- 

I 

1 

Correcterl 


Corrected 

1 car 

Easter 1 

tion factor* 

1 Uncorrecteil 

Col. (4) - 

Uncorrected 

Col. (6) 4- 





1 

Col. (3) 


Col. (8) 

(I) 

(2) 


(3) 

i (4) 

(5) 

(0) 

(7) 

1910 

Ajiril 

20 

-f4 

95 

91 

too 

104 

20 

April 

t : 

-3 

95 

98 

100 

97 

21 

Man !) 27 

-3 

91 

‘>7 

100 

97 

22 

April 

10 , 

+2 

92 

90 

100 

102 

23 

.April 

J i 


' 91 

94 

100 

97 

24 

April 

20 ' 

-t-4 

90 

80 

100 

104 

25 

April 

12 ' 

0 

: 89 

89 

99 

99 

20 

April 

1 , 

-~3 

: 89 

92 

9S 

95 

27 

April 

17 , 

+3 

89 

80 

98 * 

101 

28 

April 

8 ' 

-1 

89 

90 

98 

97 

29 

March 

:il 

-3 

89 

92 

98 

95 

1030 

.April 

20 

4-1 

89 

85 

98 

102 

31 

April 

5 

. 2 

89 

91 

98 

9(; 

32 

March 

27 

-3 

SO 

02 

98 

95 

33 

April 

10 

-r2 

89 

87 

98 

100 

34 

April 

1 

-.3 

89 

92 

98 

95 

35 

April 

21 

r 1 

89 

85 

98 

102 

30 

Ajii’il 

12 , 

0 

89 

89 

98 

98 

37 , 

March 

28 

- 3 

80 

92 i 

98 

95 

38 ‘ 

\pril 

17 

-r 3 

>- 87 

! 8t 

98 

' 101 

30 

April 

' i 

'-i 

87 

88 

98 

97 

1910 

March 

24 ' 

-3 

87 

90 i 

i 98 

95 

4i 

April 

]3 

-f 1 

87 1 

' 80 1 

98 

i 99 

42 

April 

5 

— 2 

90 , 

, 92 1 

97 ; 

: 95 

13 

Ajinl 

25 

-t- t 

91 i 

! 90 

97 ' 

‘ 101 

41 

,\pnl 

0 

-1 

98 

99 I 

90 

1 95 

45 

\j)ril 

1 

’ 3 

99 : 

102 1 

1 9(; 

' 93 

40 

Api'il 

21 

4 t 

99 

95 ; 

: 90 

j 100 

47 

April 

0 

-2 

, 98 

’ 100 

90 

91 

48 

.March 28 

-3 

95 

98 i 

i 90 i 

93 

to ' 

Ai)r3 

17 

4-2, 

'• 89 

80 ! 

90 

99 

1050 

Ajiril 

0 

- I 

1 89 

; 90 1 

I 90) 

95 

5! 

Alc.rch 

25 

- 3 

' 89 

92 

90 

93 

52 

A pnl 

13 

-f 1 

89 

i H8 

90 

97 

53 

April 

5 

-2 

89 

1 

90 

94 

54 

April 

18 

-f3 

89 

\ 80 

90 

90 

55 

April 

10 

0 

89 

i 8‘3 

90 

90 

50 

Ai;nl 

1 

-3 

89 

1 92 

9() 

93 


* To hi' t<i \i>ril and auLtiacU'd alKf'Virairally from March. 

Data »n i.' ar.d 3 innn 'laOlc l.'iJi. J'igtircfl in ('olinnns 4 and Q from Federal Reseive Hunl 

of Nwv York. 



CiiAi*. 15] 


CllANGl[\(i SEASONAl. PATTEJUNS 


359 


versa; therefore, half of the gross correction factor is subtracted from one 
month and half is added to the other. These net fa(‘tors are shown in 
Columns 3 and 4 of Table 15.0. 

10. Final!}/, add (he net correction factors; algchraicalli/ io (hr April 
aeasonat adjustment Jadors and subtract them aUjehraically from th< March 
factors, as in Tabic 75.7. The resulting seasonal factors an^ the ones 
applied to the unadjnst(3d index numbers to ol;tain the published season- 
ally adjusted serii's 

Oncii a satisfactory set of Easter adjustments has 'ni'cn derived, the 
entire set of computations need not be repeabai eacii ye.ar. Easter cor- 
n^ction factors can be read from a. table, sindi as 'rable, 15 0, and applied 
to the t)asic scaisonal factors, a.s proj(3cted in "J'able 15.7 tlirough l!)5G. 
Every few yt'at^, ho'vever, the additional exper'-sice should be used to 
revi(wv the adequacy of the Ea.sier adjustment, just as in the ease of 
changing basi(‘ seasonal factors.^ 

Siiticlcii changes in entire seasonal pattt'rri. Prior to l!t35 it liad 
been customary to hold ih(* New Ynrk automobile show m Januarv' ot 
(*ac‘h year. It wa.'' inenfiom d. in ( Impter II, that' in l!^d5 a sliow was 
liehl, not. only in .laiinary, but .aho in Xovember. the Ntrctanber show 
being ui lien of the sliow originally planned tor Jainiaiy For some 

vi?ars thon‘aft(*r the show wais held in .Xoxemher. Tlie iinixiri aic’e *>1 tin* 
X"ew York sliow stemriH'd fiom tlie fact that it was at. these .show*' that 
most new' rnoilcis of autimndnles w’ere revt^akal to th(' [)ubli(*. ih'lore 
Pd35 th (3 seas(»nal inovcmu'nt of aulomol)ile sales showed a liigh in the 
spring (a few' months after the .diow’^ and a lov' ni the fall and winter. 
From 1035 until tin ()eginning of Wiuld War II, i' 'o seasonal highs each 
year were in evidences one in llie spring and oTie \ m y late' in the' year. 

When a sudden e'hange in an entiro se.asonal p.ittein occurs, it merely 
necessary te) eannpute' tA\o seasonal iiuh'ves, one* for tlu' pt'iioel pi'cceMiing 
the change and one fe)i Ihe \ ears following the change Tiie wvo indrxes 
may beeathe'r <‘onstant or clianging, wdii(*he\a*r apin opriate for tln'st'nes. 

Sliorl -ti me shifts in timing, d'he varying date of F.aster allced.'^ 
mat(*rially only March anel A[)ril: changing the date of the auleimo'hile 
show afle'cteel chiefly a fewy months prece*ding ami fe)llow'ing it. Weather 
conditions, however, which also vary from year to year, may re.*sult in 
early liarvesls e)ne y(*ar and late lia. ''.sts the next', ami neil only may the 
marketing of the preieluct begin at ditTewent time's in tiillere*nt years, but 

^ .\n niotlioel ol making luljiistmciits for tlio cliangmg (latr ul I ;is<(*r in 

v'erUif soabonal inljustiju'iit iart(>i> hius h(‘i‘n workoil out by tlu' lAnloial llcsorve Bank 
of ( Mi'vcland Sco, “ I )(\MTipt ion of Motliod of ( Vnnpiitation of ihr Wo(’kl> frulcx of 
]>partnK’nt Stom Salixs. Fourth Fcdontl Heserve i:)idtriot.,‘’ pp. -1 9 (iniineographcd, 
July 1052). 
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the flow of goods during the entire year may be affected, the effect being 
to shift the whole pattern a few months to the left or right. Likewise, 
consumer demand may vary in timing, depending on how early the 
weather changes. 

Such shifting seasonal patterns present a difficult problem. Perhaps 
the most practical solution is to regard the situation as a special case of a 
sudden change in entire pattern, to group together the years (not neces- 
sarily adjacent) which show the same timing in their seasonal turns, and 
to compute as many seasonal indexes as there are groups of years. In 
computing such indexes, there is no reason why the calendar year must be 
taken as a unit. Rather, if the subject matter has to do with agricultiire, 
the year should be reflated to the crop year. Perhaps the central month 
should be the seasonal high or the seasonal low. 

Varying amplitude. Some economic series retain more or less the 
same general seasonal pattern from year to year but have a tendency to 
vary either gradually or suddenly in amplitude. This is particularly 
true of stocks of agricultural commodities. For example, stocks of agri- 
cultural crops show varying seasonal amplitude from year to year depend- 
ing upon the amount carried over from the preceding year, the size of the 
harvest, and the amount consumed. Likewise, shipments of liv<^stock 
are likely to vary in the amplitude of their seasonal swing. Here the 
variation may have something to do with the advantage of immediately 
selling the livestock, m cornpaivd with holding thtnn for further fattening 
or a price increavse. Sin<’e the relative advantages of these policies 
(discussed on page 145) are likely to vary in cycles, so the amplitude of tin? 
seasonal variation is likely to change in cycles, and the change in pattern 
might conceivably be treated as a moving seasonal. vVnother ease is that 
of increavsed .seasonal amplitude in manufacturing, brought about by a 
general cyclical tendency toward hand-to-mouth buying. It is apparent 
that this change als(j might be thought of as a moving seasonal, the 
progression being cyclical rather than trend-like. 

It must be apparent that, when the amplitude of a seasonal movement 
is not changing gradually but changing suddenly, and in the main unpre- 
dictably, a moving seasonal cannot overcome the difficulty any better 
than it can that of short-time shifts in the entire seasonal pattern. Any 
of the types of seasonal indexes hitherto described would in some years 
show too great amplitude and in other years too small amplitude. The 
method of correcting a seasonal index for sudden changes in amplitude 
is somewhat akin to the adjustment for the changing date of Easter. It 
wdll not be described in detail in this volume/ but in general the procedure 

■ For a full description, with tables and charts, see the first edition of thi.s text, pp. 
5ia-52^. 
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consists of determining the relationship that exists for the^ 12 months of 
each year between (1) the seasonal index expressed as deviations from 100 
and (2) the percentage deviations of the original values from the 12-month 
centered moving average, the latter percentage deviations being adjusted 
to average zero. The relationship between the 12 pairs of values for each 
year yields an amphtude ratio which indicates the correction to be applied 
(by multiplication) to the original seasonal values expressed as deviations 
from 100. To each of these deviations 100 is then added. 

A word of caution may be in order: if a moving seasonal has been used, 
a change in the amplitude ratio does not necessarily indicate a change 
in the seasonal amplitude of the original data. A gradual increase in 
the seasonal amplitude, for instan<‘c, would be reflected in the moving 
seasonal index rather than in the amplitude ratio but the moving seasonal 
would tail to rc'gister any sudden (iepartures from the general trend in 
amplitude change. 

IT RTHER REITNKMKN OF METHOD 

Conliniiitv of seasonal imlexes. A stable seasonal index averages 
100 per cent, not only for the 12-month period selected for the index, but 
for any consecutive 12-month period The latter, liowever, is not true 
for any of the seasonals explained in thi^ (diapter. though in the case of a 
progressive or moving seasonal the discrepancy is nominal only. Par- 
ticularly in the case of seasonal inde\(*s corrected for variations in ampli- 
tude, however, the di>(‘re[)ancv mav .assume alarming proportions. The 
ditriculty manifests ilM‘]f \u tii>contnn’i<y of the .^vC-sonally adjusted data 
at the point wIutc one year ends and the next begi; . Let us assume, for 
instance, that, the unadjusted seasonal iiuh'x numbers for December 1952 
and January 1953 are ea('h 80 per cent, the amplitude adjustment to he 
applied, let. us say, to calen<lar years. Now, sup])()se further that the 
amplitude ratios are 0 5 and 1.5, respectively. Tliis makes the adjusted 
December 1952 index number 40 per cent and the January 1953 number 
120. It is apparent that there will be an enormoir^ drop in the seasonally 
adjusted data between Decemh(»r and January. Vet a little thought will 
convince one that the change in amplitude not take place entirely 
in a month's time, but represents a transition of several months' duration. 

Although there is no entirely sav factory solution tor this difficulty, 
one remedy, which is very laborious, is to compute an amplitude ratio for 
each consecutive 12-nionth period of the entire series. For instance, if 
the data ran from 1943 through 1953, the first 12-month period would run 
from January 1943 through December 19 43, the second from February 
1943 through January 194 4, and so on. Altogether there would he 121 
such 12-montli p(‘riods and the same number of amplitude ratios. We 
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could speak of these ratios collectively as a moving amplitude ratio. 
Following the analogy of a 12-month moving average, these ratios should 
be centered by a 2-month moving average, leaving 120 amplitude ratios, 
nirming from July 1943 through June 1953. Tlu^ seasonal index numbers 
are then multiplied by these amplitude ratios to obtain the final seasonal 
index numbers. 

This procedure is laborious, but it is not entirely satisfactory. Al- 
though there is no sharp break in the eontinnity of the series, it has the 
defect that not any 12 consecutive seasonal index nuinlicrs are centered 
on 100 per cent. A less accurate but also nnu’h less laborious procedure 
than the one just described is to compute an amplitude ratio for each 
standard year, center the ratio on the sixth or seventh month, and inter- 
polate arithmetically from one year to the next. 

Combinations of seasonal types. It is frcipiontly true that the 
seasonal variation of a series may ho gradually (Jianging in pattern, 
shifting in its timing, or varying in amplitude, or some «‘t)miunation of Uie 
three. For data showing shifts in timing and I'hanges in amj>litude, the 
procedure for obtaining final \seasonal indexes might be: (I j break data 
into sub-periods according to occurrence of seasonal higii; (2) <’»>m}>ute 
stable seasonal for each .such sub-pm’iod; {1]) using thes(’ seasonal iiuh'xes, 
compute amplitude ratios for each year (possibly using the nn^thod of 
interpolation described above); (4) multiply the seasonal index numbers 
by the appropriate amplitu.de ratios 

Other combinations of seasonal b(*havior may call foi dilTeriMit triMb 
ment. ( U)nsidera})le ingenuity is freciueully »e(}uiu‘d to nieasur(‘ seasonal 
variation successfully. rnfortunaU^iy, tlnne is no way of tt‘Iling when 
we have arrived at. the best solution of the problem. Complexity of 
pro(*edure does not guarantee that the rt'-iills oblainod accurately 
elescribe the movement wliich we set out to mea'aire' Partienilarly if the 
data are originally unrelitable. great H*lin(nn('nt of method is likely to be 
largely wasted effort. 

Logical basis of iiietbods of conslriiclion. Willi tlie ex(‘e]>tion of 
the adjustment for Easter, the methods descri}j(Ml in this eJiaptcr are more 
or less empirical in nature, depending for th<*ir validity upon the results 
which they produce. A method is hfJd to be .sa,ti.sfact(ny if the deseason- 
alized data: (1) do not show' similarity of intra-y(‘ar pattern (otluu* than 
cyclical) in different years; (2) are not extremely irregular in their move- 
ments; and f3) are of about the same magnitude as tiie original data in 
12-month periods. 

The Easter adjustment, on the other hand, attempted to find a func- 
tional relationship hetwa^en April sales minus March sales and the dafe of 
Easter. Carrying this idea further, u might be possiiile to lind a nuineri- 
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cal relationship over time between length of daylight and sales of incan- 
descent lamps; or between temperature ajid sales of ice; or between a 
combination of temperature and snowfall and wile of galoshes. Computa- 
tion of seasonal indexes by such a method would carry us far into the field 
of correlation, which is treated in Chaptius H) 22. Furthermore, it 
would be difficult to measure the importance, let us say, of Christmas by 
correlating sales with some othf^r factor. 

Intermediate between these two types of methods is that which obtains 
a first-approximation seasonal index by an em})irical method, and then 
seeks to smooth this index hy fitting a. eurve to the seasonal index numbers 
on the theory that the seasonal movtuncfit. would jirr'seiit a smooth 
pattern if the period covened were long C‘nough to jicrinit a)i exact can- 
celling out of all irregular movements. Free! and smoothing of the 
seasonal curve is practiced by a few slatisticiaiis. The fit ting of a mathe- 
matical curve is not usually adv(H‘alcd. Not only may logical objections 
be raised, hut. there may be socinl factors lliat tiisiurb the smoothness of 
contour inherent in a simple mathematical curve. 



Symhffls Ised i/t Chaff lor 16 


/3i: lower-case (^ireok beta, a measure of *^kewuess. See (/hapter 10. 

/Jo*, lower-case (iroek beta, a measure of kiirtosis. See C"ha[)t,cr ID. 

C\ cyclical. 

/: irregular. 

N: the number of items in a series. 

8\ standard dt;vi:iliou. See? Chapt^u 10. 
seasonal 

2: upper-ea.so (ireek sigma, im'aning “tak(‘ tlie .sum of.'^ 

T: trend, 

X: a value of the X series 

y: a cyclical deviation: aft<ir irregular iiK'^venuaits have been smoothed, 
the deviation of a value in a time '-eries fi-om tln^ (‘oinbined estimate of 
trend and sea.sonal 

1%: a computed \-alue of tlie )’ senes. 
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CHAPTER 16 


Analysis of Time Series: 


(TVCLICAL MOVEMENTS ADJUSTING TIME 
SERIES FOR TREND, SEASONAI., AND 
IRREGULAR MOVEMENTS 


In ( 'haptur 1 1 it was pointed out that rnrnthly time series are typically 
the product of the four important movi‘nient>: secular trend (T), seasonal 
variation (S), cyclical movements (r\ and irregular fluctuations (I). 
Chapters \2 and 13 \v< 3 re devoted to consideration of types of trends, how 
to select the appropriate type, and methods of trend (itting. Chapters 14 
and 15 gave attention to types of seasonal variation and the determination 
of indexes of seasonal variation. In this chapter, we shall first discuss the 
elimination of trend from annual lime series dat?i Following this, both 
seasonal variation and trend will be eliminated fr^nu monthly data, and 
irregular movements will be smoothed. The tinal result will be a set of 
adjusted data showing primarily the cyclical iiicvements of the series. 

ADJUSTING AINNIJAL DATA FOR TREND 

It is, of course, olivious that annual data, which show but one figure 
for each year, cannot contain any seasonal variation Neither can annual 
data show irregular movements, although it is possible for an episodic 
movement (such as one due to a severe strike a conflagration) to be 
important enough to affect an annual total. 

Table 12.2 showed the comput. 'ions necessary for determining a 
straight-line trend for magazine advertising for 1915-1949. The trend 
values resulting from use of the equation were given in the last column 
of Table 12.2 for 1915-1953. Chart 12.3 showed both the observed 
annual data and the trend. Table 16.1 repeats the observed data for 
1915-1953 and the trend values for the same years. In Table 16.1 we 
have also computed the per-cent-of-trend values for each year. These 
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('.hart 16.1. \iiniial Ihitu of Ma^azinr Advf'rti^in^ in the (liiited Slaters 
Adjusted for 'IVond, 1915- I95.‘t. The lOU per cent hiise is shown as a bioken lino 
for ixn ause the tiend was fitted to the years 1915 1949 and extended to 

1953. Data of Table It).! 

are obtainocl by ilirifiuuj eueli of the onu;inal rij2;ures by the eorresponding 
trend vahie and mnltiiilyiiig by UK). The rostihs are shown in Chart 
16.1. .\nnnal data providt' only very rough indiiaitors of the fluetuations 

T\BLK I6.I 


Adjuslmenl for Trend of IhtCa of Mufiazinv .Uii'ertisin^ in the Tnited 

State>^ VH5 I95,'f 

(l.it.i iinj tr< lui V4iiu.’'< in ii()llM*n.s *if u^rat»‘ liin'i) 


Year 

data 

1' 

'I’n'iid 

values 

)'- 

pt r e«*Iit 
of tK'fid 
100;}' -- ) 1 

^'(•ar 

< >! ij* lird 
d iia 
) 

'Tn'iid 

\ .lllK'S 
)\ 

Pei (M-nt. 
of 1 lend 
100(1 ■ 1' 

1!)I5 

10 

9 

21 2 

79 7 

1935 

23 

1 

33 0 

7; 0 

Itllil 

20 

0 

2! S 

9i 7 

P.UO 

2.S 

3 

33 (> 

SI 8 

19J7 

21 

.3 

22 1 

95 1 

1937 

•J2 

1 

31 2 

93, 9 

1918 

IS 

0 

22 9 

81 2 

1938 

23 

1 

3 1 7 

73 2 

!1>19 

2.5 


23, 3 

109 1 

l‘»,J9 

23 

0 

3.5 3 

72 3 

l')2(i 

.i3 

i) 

21 1 

139 I 

1010 

20 

9 

33 9 

71 9 

19‘2I 

22 

,3 

•21 7 

90 3 

1911 

27 


!ir, 3 

73 9 

1922 

24 

4 

23 3 

90 1 

1912 

23 

7 

37 1 

09 3 

1923 

30 

2 

23 

) 10 0 

I9i:i 

33 

i 

37 7 

87 8 

1921 

31 

1 

20 5 

1 18 5 

1911 

12 

0 

* 38 3 

j 109 7 

1925 

31 

5 

27 1 ' 

110 2 

1915 ' 

19 

0 

i 38 9 

120 0 

192ti 

3.3 

3 

27 7 j 

128 2 

1910 ! 

51 

S 

.39 5 

138 7 

1927 ; 

30 

3 

28 2 

129 4 

1917 

.50 

8 

' 40 0 

127 0 

1028 j 

30 

4 

i 28 8 

120 4 

19 48 1 

17 

S 

40 0 

117 7 

1929 I 

i dO 

0 

. 29 4 j 

138 1 

1919 j 

43 

8 

41 2 

100 3 

1930 : 

1 35 

8 

• 30 0 1 

1 19.3 

1950 ; 

45 

8* 1 

41 8 

109 0 

1931 i 

28 

9 

1 30 0 

9 4 4 

1951 * 

48 


42 4 

113 4 

1932 

i 21 


31 2 

07 9 

1952 . 

48 

3* 1 

1 43 0 

112 3 

1933 

i 18 

7 

1 31 8 

58 8 

1953 i 

50 

5* 1 

43 0 

115.8 

1934 

i 24 

3 

1 32 4 I 

75 0 







* Not mod for corn)>utinff trroid. 

()rti£)iial diifa from vununs of Sur^^jj of <*urr,^m fiu*tn*>itg 'IVrnd \aliie« from Tahle 12.2. 
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of a time series, but Chart 10.1 shows that marked fluctuations have 
occurred in annual maj^azine advertising:? linage. 

In Table 16.1, trend was eliminated by division, rather than by sub- 
traction. If the trend values liad boon subtracted from the original 
figures, the result would liave been deviations in abolute terms (millions 
of agate lines) rather than relative Icrms. For most purposes, it is more 
useful to know whether the variations are large, or small, iu relation to 
some logical base, such as the trend, dims, a deviation of oO is ten times 

as im[)ortant wlien judged ^\ith respecU. to a trend value of 200 as it is 

wlien compared with a trend value of 2,000. 

ADJUSTMENT OF ATON TIIIA DATA 

Although tliere are other methods of arri;.ng at estimates of the 
oy(‘li(*al mo\a'ineiUs of tinu^ series, some of which an' mentioned at the luid 
of thi,'* ^''olcr. the so-rjilled ‘‘ n'sidnal mrdhod” is most (‘ominunly used, 
drills method vons'^ts of elnninating M'a.vonai variatioii and trend, thus 
obtaining the <’y(‘1i('al-irr('gular nio\ ements. Symbolically, ‘ 

[T X S X C X /) N - T X C X I and 

[T X U X 7) ^ ?’ - C X L 

Next, the data are usually Miiooth(‘d in ord(‘r to obtain tlie cyclical move- 
ments, which are sometimes termed the cijcliral rrlatn'fs, sinc(^ they are 
always percentages. II, is because the cyclaail-irregnlar or tlie cycliiail 
moN’cnnonls nnnain as [(‘siduuis that this procedure is referrc'd to as the 
rrsidnal method. 

Dc-scasonali/iiig. As pointed out iu Chapter 11, a seasonal index 
may be compnt(’d for tlu' purpose of stndyirg tin* seasonal movemimt 
itself. t,}ie objiM'live being to avoid or miniinize the consecjnences of the 
si‘asonaI change'^, to smooth out the seasomd fluctuations, or to taki^ 
advantage of t.liem. On the other hand, wc may l>e inten^sted in studying 
;i turn* s(Mi(‘s nndist in bed by seasonal sanation and tins we aecomplish 
by adjusting the observ«'d data for si'asonal variation. 

Th(^ computation of a seasonal index ami its use to (h'seasonalize a set. 

1 The of 7’ X > X (' X / is nion‘ pcuornllv useful th-m is that of T -j- S -f- 

C 1 . 1'lus IS because *S. (', and I tei» to remain more mairly constant in nia^^ni- 
tude relativt' to tend ratl'.cr tiian m absolute terni.s. Furthermore (he movements 
are ortlmarily mon* meaningful ^\h^■^ eonsideoal relative to each other than wluni 
considered in absohitj' terms Thus it is possible to eompiite a seasonal md(‘\ which 
remains constant user a period of years, to d<‘termme a seuMinal index which is chang- 
ing bi'eausc of alterations m the relative importance of the moiitiis, and to compare 
the percentage fluctuations of cvibeal mo\cmcnts. Oci'asionally series arc encoun- 
tered for which b. tter results arc obtJune<l if the. seasonal imncment is considered 
constant in ab.solutc ratlicr tlian relative terms. This is discussed oa pages 372 
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of monthly data may be but one step in the isolation of cyclical move- 
ments, the other steps (to be described shortly) being the adjustment for 
trend and the smoothing of irregular movements. Not infrequently, 
however, it may be desired to study economic and business series adjusted 
only for seasonal variation. Thus, businessmen, in making decisions, 
may consider not so much whether their sales are increasing (or decreas- 
ing) relative to a not-too-easily-visiialized combination of trend and 
seasonal movements, but rather in relation to the ordinarily expected 
sales for the particular season of the year. It is of interest that many 


THOUSANDS 
SMONT TONS 



1944 1945 I94« 4947 1948 1949 1950 1951 OSZ 

Chart 16.2. . CoiiHUinption of Newsprint b\ CiiiteU Slates PuhlisherH (Solid 
Line) an<l Deseasonalized Data (Broken Line), 1941 1952. Jjntn of Table 10 2. 


deseasonalized series appear in the Federal Reserve Bulletin, issued by the 
Board of Governors of the P>deral Reserve System, aiid in the Survey of 
Current Business, piiblivshed by the Office of Businc!^s Mconomics of the 
Department of Commerce. 

The elimination of seasonal variation is ordinarily accomplished by 
dividing the original valuer by the seasonal indt'x (and multiplying 
the results by 100), as shown in Table 16.2 for the data of news- 
print consumption. That is: {T X S X C X I) S ^ T X C X /, so 
that the deseavsonalized data contain trend, cyclical and irregular move- 
ments. The desca.sonalized data of Table 16.2, together with the original 
figures of newspaper consumption, are shown in Chart 16.2, where it is 
apparent that the curve of the deseason^ilized data is much the smoother 
of the two. Because the period covered consists of but nine years, 
neither the original data nor the deseasonalized data show cyclical move- 
ments. The data of newsprint consumption were chosen as an illustra- 
tion in Chapter 14, not because they would or would not show cyclical 
movements after seasonal variations were removed, but because the series 
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had a clearly defined seasonal which, when tested by drawing the twelve 
monthly charts of the per-cent-of-moving-average data (like Chart 15.2), 
did not appear to change- from year to year. However^ the curve of the 
deseasonalized data suggests that the seasonal index may not be quite as 



JFMAMJ JASONO 

i^hurt 16. .‘i, Yeur-0\ cr-^ ear 
Chari of DeHeuhonalizecl Data of 
CoiiHiiniptiuii of •Newsprint by 
Stales Publishers, 19^^-- 
1952. Data of Table 16,2. 

satisfactory for 19b2 as for the (*arlier years, ivaA if the analysis were to be 
continued for a number of years beyond 1952, the 1 2 monthly charts 
should, of course, l)e extended and r^ jxamiiied. Incidentally, the peaks 
shown in the deseasonalized data for May 1949 and April 1951 do not 
represent residual seasonal fluctuations, but reflect unusually high 
original values for those months, as may be seen in Table 16.2. 

* There was evidence of a slight mereasc in the seasonal importance of April and 
May and a slight de* rease in the importance of J'lly and August, but no clear evidence 
of a changing seasonal movement. 
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TABLE 16.2 

Elimination of Seasonal yaridtions from Data of Consumption of ISeujsprint 
by United States Publishers, 1944-1952 


(Orlfi^mal and deiieasoQalized data in thousandK of short tons) 


Year and 
month 

Orig- 

inal 

data 

(2) 

Sea- 

sonal 

index 

(3) 

Deseasoii- 

alized 

data 

Col. 2 
^ Col. 3 
(4) 

Year and 
month 

0) 

Orig- 

inal 

data 

(2) 

Sea- 

sonal 

index 

Deseaaon- 
alizcd 
data 
Col. 2 
-;- Col. 3 
(4) 

1944 
January . 

194.7 

93 9 

207 3 

1947 

January . 

266 4 

9 : 1.9 

283 7 

February . . 

176 2 

90.3 

195 1 

February 

258 4 

90.3 

285.2 

March . , . 

201 7 

105 2 

191 7 

March . ... 

302.7 

105.2 

287.7 

April .... 

201 1 

104.7 

192 1 

Aj)ril .... . 

297 5 

104.7 

284.1 

May 

197.4 
191 1 

105.3 
98 9 

187.5 

193 2 

May 

303.0 

105.3 

287.7 

Jjine 

June 

292.7 

98.9 

290 0 

July 

174.9 

88 5 

19T,6 

July 

263.7 

88.5 

298.0 

August .... 

182 4 

93.1 

195 9 

August .... 

281 . 1 

93.1 

:^oi 9 

September . 

189 6 

09.3 

190 9 

September . 

299.8 

99 3 

301 9 

October . . 

218 1 

110.4 

197 C 

October, . , 

339 3 

no 4 

307 3 

November . 

211.6 

107 2 

197 4 

November 

338 0 

107.2 

315 3 

December. 

206.0 

103.6 

199 0 

December . 

322.1 

103,5 

311 2 

1945 

January. . . 

185.2 

93.9 

197 2 

1948 

January . . 

292 5 

93.9 

311.5 

February . 

175 1 

90 3 

193 9 

February . 

297 4 

00 3 

.329. 

March ... 1 

202 8 

105 2 

192 8 

March 

33,8 3 

105.2 

321 6 

April . . . 

203.2 i 

104 7 

194 1 

April. 

342 1) 

104.7 

327.2 

May . . . 

205 8 

105.3 

' 105 4 


3 18 8 

105.3 

331.2 

June 

190 5 

98 9 

192 6 

June 

327.1 

98 9 

3.30 7 

July . . . 

177 9 

88 5 

201 0 

July 

291 0 

88.5 

329 5 

August . . . 

202 9 

93 1 

217 9 

Aiimiiit 

,114 0 

93.1 

337 3 

September 

213 3 

99 3 

! 214 8 

SepUunbru' 

337 2 

99.3 

339 0 

October. 

236 9 

1104 

214 6 

0(‘tober 

381 7 

no 4 

345.7 

November 

236 1 

107 2 

220.2 

November 

3t)4’ 3 

107 2 

339 8 

December . 

225.4 

103.6 

217.8 

December , 

303.7 

109.5 

351.4 

1946 

January 

221.1 

93.9 

235 5 

1949 

January. . . 

332 7 

93.9 

354.3 

February 

223 2 

90 3 

247 2 

February . 

30,8 8 

90 3 1 

.342 0 

March 

'267 7 

105.2 

254 5 

March 

3fit; 9 

103 2 i 

348 8 

Apni 

259 0 

104 7 

247 4 

April . . . . 

308 9 

104 7 

352.3 

May . . 

! 261 5 

105 3 

248 3 

Ma.V . 

392 2 

10.5 3 

372 5 

June ... ! 

1 259 3 

98 9 

262,2 

June 

349 9 

98 9 

353 8 

July . . 1 

213 1 

88 5 

274 7 

July. . 

313 1 

88 5 

.353.8 

Auguet 

257 3 

93 1 

276 4 

August 

318 0 

93,1 

341 6 

September . 

265 6 

99 3 

267 5 

September 

350 5 

99.3 

359.0 

October. . 

292 2 1 

no 4 

264 7 

October 

309 3 

no 4 

361.7 

November . 

291 5 

107 2 

1 271 9 

Novcrnlx'r 

378 t) 

1 107 2 

353 2 

December . 

294 8 

103 5 

1 284 8 

December. 

I 372 5 

1 103 5 

3.59 9 
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TABLE 16.2 (Concluded) 


Elimination of Seasonal f'ariatiotis from Data of Con- 
sumption of Newsprint by United States 
Publishers, 1944-1952 


Year and 

i Original 

[ S4*asonal 

1 Deseasonahzed 

month 

1 data 

1 index 

1 

Col 2 ^ Col. 3 

(1) 

(2) 

(3) 

(4) 

i9r)0 

1 ■■ ' ■ 

1 



January 

1 

93,9 

367 6 

February 

m.2 

90 3 

309 0 

March . ... 

306 9 

1 J05 2 

377 3 

April . . 

403.8 

■ J04.7 

,385.7 

May 

401.9 

105.3 

381 7 

June 

370.6 

98 9 

,380.7 

.luiy , ... 

330 8 

88 5 

380 0 

Aup;ust 

310 8 

93.1 

372.5 

rrp^-einher 

373 8 

99.3 

370.4 

October 

420 8 

110 4 

3K1 2 

November 

•107 9 

107 2 

,380 5 

Deceinlx'v 

308 3 

103 5 

384 8 

1901 




January 

:U5 0 

93 9 

308 1 

Ftdiniafv 

330.#’ 

90.3 

372 8 

March 

091 4 

lor, 2 

374 9 

April 

au.7 

101 7 

392 ,3 

May : 

403 2 

i05.3 j 

382 0 

Jiuu^ ' 

305 3 

98 9 1 

1 309 4 

July ; 

333 4 

88 5 

! 370 7 

August 1 

3U 5 

03 1 ' 

' :m0 0 

Si'pt ember 1 

3S1 : 

99 3 

3S4 1 

()r'tobe» I 

105 3 

110 4 

307.1 

November | 

402 8 

107.2 

375 7 

Dceeml/er. | 

387 8 1 

103 5 

374 7 

10, V2 t 




January 

345 3 ' 

93 9 

1 307 7 

February 

330 0 

90.3 ! 

! 372 8 

March i 

309.3 i 

105.2 

370 0 

Apiil 1 

393 5 

104 7 

375 8 

May 1 

401 1 

105.3 ' 

383 8 

June . , 1 

379 9 

98 9 

384 1 

July. . 1 

329 7 

88 5 

372.5 

August . 

' 311 0 

9c. 1 

3(>0 9 

September 

' :{79 7 1 

09 3 

1 382 4 

October 

•IL*'’ 0 

no 4 

'M 9 

November 

•1 IV 0 i 

“ 107 2 

389 0 

December | 

386 6 i 

103 5 

373.5 


Data from TaMoa M h and 14. 7, 
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Test of seasonal, A practical test of a seasonal index is to see Avhcther 
its use has eliminated all of the vseasonal moveineriT from the series. A 
chart of the type of Chart 10.2 may be used for this purpose, but a year- 
over- 3 ’'ear chart of the desoasonalizod data, C'hart 10.3, is better. From 
this chart it may 1)C seen that the fluctuations still present in the deseason- 
alized data are largely irregular movements which stand out because of 
the lack of cyclical fluctuations in the series. When residual seasonal 
movements are pre.^'Cnt in an adjusted series, the ^'urves of a year-over- 
year chart will show some similarity with eae}} oilier. 

Correction by subfrf.iction of scasottaL It o<‘(‘asi<)nally happ('ns that 
grotescpie results are obtained when seasonal is eliminated by dividing by 
a seasonal index. This is especially likely to ht» tb{‘ ('ase when the sea- 
sonal moveineiit typicall}' falls alrno.st to zero at one oj‘ more months. 
Then, if in any given year the original data remain materially above zero 
for those months, division by the extremely lou seasonal index percentage 
will raise the doseasonalized data to a very sharp ])eak. Even though a 
seasonal movement may not fall to or near zero, liicn^ are rare instances 
in which a seasonal pattern may be constanr in ahsolnto rather than 
relative terms. This will be apparent if the {lereentages of moving aver- 
age tend to be large when the original data are at a low level and small 
when the original data are at a high hjveL 

"A simple expedient is as follows. Compute a s<‘a.sonal in(i(‘\ by what- 
ever method seems appropiifite. The index i.s tiow eori'/nled into terms 
of the original data by multiplying the .sea.sonal index nurnhois (expressed 
a.s percentage deviations) each year by the a\'erage value of the original 
series for that year. Seasonal iv< them eliminaK'd by subtraiuing. alge- 
braically, the .seasonal index from the original data. 

It may be de.>.irab!e, to compute the index number, in the first instance, 
in such a way as to obtain a seasonal index in alisolute rathei- than relative 
terms. This will be so if the seasonal movements each year seem to be 
similar in absolute magnitude rather tlian in percentage deviations. 
Inspection of a chart of the original data may indicate whether this is 
true. If the evidence indicates that an index of absolute deviations 
should be computed, it \h necessary only to adapt one of the methods with 
which the reader is alread}'^ familiar. For instance, if the moving-average 
method is used, the moving average is .subtracted from, instead of divided 
into, the original data; and the index from that point is constructed as 
usual, the final index being adjusted to total zero by the addition or sub- 
traction of a correction factor. Incidentally, it might be noted that any 
of the devices explained in Chapter 14 may be based on the subtraction 
method of computing seasonal. The link-relative method (dcsoril)ed in 
Chapter 14) can also be adapted veiy easily as follow\s: (1) obtain link 
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difToroucns by subtracting the preceding irionth from each month; (2) 
average these link differences, mouth by month; (3) let the first-month 
link cliff m*ence be zero, and chain the links by successive addition; (4) 
correct chain differences for (upward) trend by successive subtraction of a 
correction fact or; (5) adjust chain diffciTuces to total zero by addition or 
subtraction of a constant correction factor. 

Adjnslin<MU for seasonal aiul ireiid. To serve as an illustration 
for most of tlie l)aianc(' of this section, we shall use the data of magazine 
advertisjiig lii^age, for which tfie trend was asi ertained in (diapter 12 and 
for pari of wdiich a moving seasonal index was computed in C4iapter 15. 
The usual procedun? consists, first, of removing tlie seasonal fluctuations, 
giving 

(T X A' X r X /) : S - r X C X / 
and, next, eliminating trend to give 

(T X c X I) : T r: X T. 

Wc sliail use tlu‘ data of magazine a(lv(‘rtising linage from January 1921 
to I>cc(’mb(‘i l9o3. The original, unacljustcHl data are shown in Chart 
It), 1. The removal of seasonal variation is accomplished exactly as 
described for tlie data of consurnplioii of newsprint., by dividing the origi- 
nal data by llu' s('asonal index. Tliis procedure is indicated in Table 
It). 3. To)' magazin(‘ advertivsing, the seasonal indexes used were: (1) a 
constant ind(\\ foi the p(}riod 19*21 1929, (2) a dilTm‘cnt constant index for 
1930 1937, (3) a moving .'seasonal index for 193.’^ * to2, and (4) the 1952 
values iep(‘aied for 1953. Tlie use of the 1952 sosonal index for 1953 
follow's the usual pi'actice when it is not possible (because of unavailability 
of subsequent data) to (‘.xtend the moving seasonal index. The deter- 
mination of the 1943 1952 portion of the moving seasonal index \vas 
described in the preceding chapter, the index appearing in Table 15.3, 
The seasonal indexes were shown graphically in ('hart 11.9. The 
deseasonalized data of magazine advertising arc snown in Column 4 of 
Table 16.3 and in Chart 10.4. 

The next step consists of eliminating trend, ihe proecdure being the 
same as tliat sliown in 4'able 16. 1, except that we are now dealing with 
monthly data and must put the t nd ecpiatioii into monthly terms. 
Note that while our present illustration concerns the years 1921 1953, the 
trend equation was fitted to the j)criod 1915 1919 and w'as extended 
through 1953. On page 276 the trend, in moiq.hly terms, was found to be 

Yc = 2.t;028 -f 0.00 I074A\ 

Origin, July 1932. X units, 1 month. 
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TABLE 16.3 

Adjtmtment of Data of United States Maffasine Advertising for Seasonal 
Variation and for Trendy 1921-19S3 


(Original data, dosnattonalizcd data, and trond salur<ii in tiioiisundw of auate Imc.s.) 


Year 

and 

month 

(IJ 

0 rii 4 inal data 

T X 6XC X J 

i'2) 

riea- 

aoaal 

index 

5 

(3) 

data 

T X C X i 

(Co! 02) Cob (3)1 X 100 
(1) 

Trend 

valuoH 

T 

v5) 

C v'dical-irrcKuIar 

pcrcentttKea 

C X / 

Cob (41 Cob (5) 
(6) 

U'2l 






January 

1.97*^ 

S4 8 

2 3;!-i 

1’ nil 

114.4 

February . 

1.9^1 

97 2 

2.038 

2 015 

99 7 

Marrh 

2.005 

106 0 

1,881 

2 'Uy 

91 8 

April 

2 . 099 

lift 2 

1 776 

2 0.53 

86 5 

May 

2.145 

113 5 

1 , S90 

2 057 

91 9 

June 

1 W 

102 6 

I.S84 

2.061 

91 4 

July 

1.57:? 

81 f, 

1 ,928 

2 , 065 

03 4 


1,102 

72 0 

1 947 

2 069 

94 1 

Sejit ember 

1,020 

91 2 

1 776 

2 073 

85 7 

October 

1 , S24 

111 9 

! , t:30 

2.077 

78 5 

November j 

1.903 

114 1 

1.668 i 

2,0S1 

80 2 

December . j 

1 . h07 

106 3 

1,700 j 

2 0S.5 j 

SI 5 

10J2 ( 






Jniuiaiy j 

1 . 1)32 

84 S 

1 025 

2,Oc)0 ! 

92 1 

February ! 

1 1 7nS 

! 97 2 

1 M9 

2,0'Jl . 

8(1 0 

Munh 

; 1 922 

i 106 0 

1.803 

2,098 1 

8.5 0 

April . j 

i 2.171 

i 118 2 

1 837 

2 10.' 1 

87 4 

May 

2,215 

I il3 5 

1 952 

■2 106 ! 

92 7 

June 

2,016 

102 6 

1.99 4 

2 iiu ; 

94 .5 

July 

' l,70.'i 

1 81 rt 

2 OMl 

1 2 IM 

98 H 

August 

1 \ 560 

1 72 0 

1 2.175 

, 2 118 ' 

102 7 

Septornbor 

1 J4(J i 

! 91 2 

2,127 

2,122 1 

1 100 2 

October . 

2,470 

1 111 9 

2.207 

1 2. 120 i 

1 K)3 8 

November 

2,400 

114 1 

2. 161 

, 2.130 

i 101 5 

December 

2 .404 

106 3 

2 31S 

2 131 

lOH 6 


January , 

2.. 50.) 

76 1 


2.481 

134 

February 

■3,02t 

&6 2 

3,Hf 

2 , 

136 .5 

March 

3 416 

107 5 

3.178 

2 bV» 

127 7 

Apnl 

3 877 

12.3 fi 

3.137 

2,193 

12.5 .S 

May 

, 3 639 

122 0 

2,’*v.3 

2,0.7 

119 5 

June 

3,354 

in 1 

,3,019 

'2, -01 

120 7 

July 

2, 451 

h.J 3 

2.912 

:> 6.r, 

117 4 

August 

2,0.57 

71 7 

2.869 

2 .51 J9 

114 3 

Sepfc:nl»er, 

2 , 598 

87 7 

2 962 

2.513 

117 9 

October 

3 021 

107 9 

2.800 

2 517 

111 i; 

November 

3,012 

no 5 

2,753 

2 521 

109 2 

Decembe-r 

1931 

2.820 

103 3 

2,730 

2 , .52: 

108 1 

January | 

“ 2,001 

75 1 

2,064 

1 •» 

105 3 

February ] 

i 2, 5.^9 

96 2 

2 639 

2..)34 

104 1 

Maifh ' j 

, 2 762 

107 5 

2.569 

2 .5.38 

101 2 

April , 

3.026 

123 0 

2,448 

1 2 .5 42 

96 3 

May 

i 2.971 

122 0 

2,435 

2 546 

9.5 6 

June 

1 2 732 

in 1 

2.4.59 

2.550 

96 4 

July 

1 998 

8.3 3 

2.399 

2 , 5.54 

93 9 

AuKupt 

, 1 713 ! 

71 7 i 

2,. 3 HO 

2.5.5H 

93 4 

kSc] item per 

j 2 (H.9 

87 7 

2,359 

2 , 502 

92 1 

OctcO.er , 

! 2 . i -sf) 

107 9 

1 2.298 

2 5t.6 

89 6 

Novemb*»r 

1 2.414 

no .5 

i 2,212 

2.570 

86 1 

December 

i 2 170 

103 3 

1 2,101 

2.. 574 

81 6 
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TABLE 16.3 (Concludid) 

Adjustment of Data of United States Magazine Advertising for Seasonal 
Variation and for Trendy 1921-1953 


Year 

and 

month 

(1) 

Original data 

T '< SXC X I 

(2) 

Sea- 

sonal 

index 

S 

(3) 

Deseaaonaiised 

data 

T X C X i 

|Col. ^2) Col. (3)1 X 100 
(4) 

Trend 

values 

T 

”f5) 

(’yclical-irregular 

percentages 

CX T 

Co!. (4) r Col b5) 
(6) 

1950 

January • - 

3,261 

89 0 

3.664 

3,458 

106 0 

February 

3.808 

103 4 

3.74! 

3 , 462 

108 1 

March 

4.270 

114 8 

3,720 

3,166 

107 3 

Aprd. .. 

4.482 

114 0 

3.932 

3,471 

113.3 

May. . 

3 . 853 

101 7 

3.789 

.3.175 

109 0- 

June 

2,974 

79 0 

3,765 

3.479 

108 2 

July 

3.175 

.SO 0 

3.969 

3,483 

114 0 

August 

3.791 

98 2 

3.800 

3.487 

110 7 

September 

4 , 505 

!10 0 

3,HS4 

3,401 

111 3 

October 

4 , 002 

120 2 

3,829 

3,495 

109 6 

November 

3.958 

104 1 

3 , 802 

3.499 

108 7 

December 

3.106 

79 6 

3.902 

3 , 503 

111 4 

105’ 






January 

3.520 

88 7 

3.^168 

3,507 

113 1 

February . 

4 . 050 

102 0 

3,947 

3.5U 

112 4 

March 

4.4bt 

115 5 

3 , 8t)5 

3,515 

110 0 

April 

\ , 53 1 

114 2 

3,968 

3.519 

112 8 

May 

3,ir2t, 

100 9 

3,891 

3 , 524 

110 4 

June, , 

3.221 

79 4 

4,057 

3 528 

115.0 

July 

3 . 260 

79 7 

4 090 

3 , 532 

115 8 

Auft’isl 

3 , 93 1 

98 0 

4,011 

3 . 536 

113.5 

Sopterubijr 

dm:. 

118 2 

4.099 

3,540 

115 8 

October 

4 K'O) 

120 2 

4.034 

3 , 544 

113 8 

N\j\etiri)'er 

' 1,1 29 

1 103 t ! 

3 , 993 

3.54S 

112 5 

December ' 



1 4.2‘2.'> 

1 

J_M 9_ 

" 1952" 

1 


j ' ; 



January 

1 3. Job 

8H 0 

3,939 i 

3.5.56 1 

no 8 

Feb’^uary 

! 3.985 

101 5 

3.926 

3,560 

IICI 3 

March 

! 4.855 

117 7 

4.i2v5 

3,564 1 

115 7 

Aprd , 

1 4 . 4(>S 

111 3 

H . 909 

3 568 

109 6 

May . 

1 4 093 

i lOO 5 

1.073 

' 3 .572 

111 0 

June . 

3,213 

TO 9 

4.021 

.3 , 576 

112 5 

July 

3 . M3 

79 2 

3.956 

3.581 

110 5 

August, , 

3.9GO 

97 0 

4 , 045 

3.585 

112 8 

Sciitembcr 

J .798 

119 2 

4.025 

3 . 5S9 

112 1 

October 

4 , 898 

120 0 

4.382 

3,593 

113 6 

November 

4 , 299 

103 0 

4,171 

3 597 

116 0 

December 

3.162 

78 8 

4.01.3 

3.G01 

111 4 

1953 






January 

3.607 

88 0 

4,167 

3.605 

115 6 

February . . 

4,251 

101 5 

4,188 

3,609 

116 0 

March . ■ • . 

4.991 

117 7 

4.240 

3.613 

117.4 

April 

4.699 

114 3 

4,111 

3,617 

113 7 

May 

4 , 445 

100 5 

4.423 

3,621 

122.1 

June 

3,300 

79 9 

4 . 20.5 

3.625 

116,0 

July 

3,205 

79 2 

4,047 

3 . 629 

111 6 

August 

4,136 

97 9 

4,225 

3 . 6 < 1 

116 3 

September. . 

4,905 

119 2 

4.165 

3 , 638 

114 5 

October. 

5 . 230 

120 0 

4.358 

3,642 

119 7 

November . , . 

4.406 

103 0 

4.278 

3,646 

117.3 

December 

3.161 

78 8 

4. on 

3.650 

109 9 


Ma^a/ine advartisiiiK linani* fioiii various lasufs of tho Survey of Vurrent Business, Soaaonal index: 
fixed for 1921-1929 and for 1930-1937 troni Table 110 of the first edition of this text, changing for 
1938-1942 from workaheota not aho^Mi. changing for 1U43--1952 from Table 15.3. 1953 Baroe a« 1952- 
Trend values from the equation given on page 373» 
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Now, the equation just given is in terms of millions of agate lines, while 
the monthly data of Tabic 16.3 are in terms of thousands of agate lines. 
Therefore, the equation becomes 

F, = 2,602.8 + 4.074.\: 

with the same origin and X units as before. 

The trend values shown in Column 5 of Table 16.3 were obtained from 
this equation. Now, the deseasonalized values in C!(dumii 4 of Table 
16.3 are each divided by the corresponding treiKl v^alue i(7' X C X •’ 
T = CX 7] to produce the cyclic^ahirrogular values in Column 6 of the 
table. These cyclical-irregular values arc shown in (.‘hart lO.o, It is 
interesting to note that the values shown in Column 6 of liable l(i.3 are 
percentages, not thousands of agate lines. When seasonal movements 
are eliminated by dividing by a seasonal index (which is a series of per- 
centages), the deseasonalized data are always in the same units as were the 
original data. Trend, however, i.s in lerms of tlu^ oiiginal units, so that 
when tlie trend of a series is eliminated by dividiiig, the rt^Milting figures 
are percentages. 

In Table 16.3 the cyclical-irn^gular movmnents were obtained by 
eliminating, first, seasonal variation and then trend. In symbols, the 
procedure was 

(T X *8 X C X /) == 7' X C X I, the doseasonalized data, and 

{T X (' X / ) T - C X /, the cyclical-irregular movements. 

If it were desirable to do vso, w'e could, of oour.M‘. ohminate first trend and 
then seasonal variation, thus, 

{T X S X C y. I) T = S X C X /, the data adjusted for trend, and 

(;S X C X /) -> S “ C X I , the cyclical -irregular movements. 

>Vnother possibility consists of multiplying tog(*ther the trend and sca^ 
serial values (the seasonal perccntago.s being used as decimal ratios) and 
eliminating both of tho.^e movements at the same time. In symbols, 
this is 

{T X .S X X I ) {T X S] ~ C X /, the cyclical-irregular movements. 

Table 16.4 illustrates the.se three possible procedures for magazine adver- 
tising linage for 1952. Note that the Jinal results by the three methodvS, 
which are shown in Column 6 of eatdi part of Tables 16.4, cither agree 
exactly or occasionally differ by ().[ because of rounding. 

Of the three procedures for adjusting for seasonal variation and trend, 
the one first described is most frequently used, since it is often dcvsired 
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TABLE 16.4 


Three Methods of Obfaininf^ Cyclical- Irregular Movements of United States 
Ma/nazinc Aflvcrtisinf; for 1932 
J. Adju.slnu'rit for st’iHoiinl vari'ilion :iud tii^n for trend. 


Month j 

Oii'Miial 

latu 

T >: K r y i 

1 Sf'i 
! ‘■OU.li j 
i iiub'X 

ItfvcH'- (Uiahzed 
clR^a 

T X C X J 

1 Tri'nd 

1 valucb j 

Cvclical-irregular 

I)trc(*ntagei' 

CXI 


' s 1 fr'..l 

12) ('<A. t3)) X 

lOlJ j ‘ 

Col (41 - Col (5) 

(t) i 

•:i) 

; i-V; ; 

m 

! 

(61 

January 

3.10t. 

! S8 0 ; 


i 3, '00 1 

\10 8 

V'ebruajy ' 


; I'o : 

J 91*0 

1 3.300 

110,3 

Mari’h 

i s;>:. 

. " ; 

4 \y, 

i 3 .004 1 

Mo 7 

April 1 

l.-iOs 

• IM .1 ' 

3, }09 

1 ri.f'O.'s 

109 6 

?4jiy i 

■4 . ObU 

■ 100 < 

4 b?J 

I 3 572 

114 0 

June* 

3,L’n 

' 70 [) ' 

4 <}Jl 

1 3,570 

112.5 

July 

! J in 

1 T'l ■ 

3 .ro 

! 3 . bSi 

' 110.5 

Anpu* r 

1 b MOO 

• j7 b 

4. OK) 

' 3,0 S3 

1 112 8 

Soptoiu^'or 

1 i 

1 J 1 <i u 

4.010 

1 3 589 

; 112 i 


} S' >8 

i ' '0 i‘ 

4 t'R'l 

' 3.593 

j 113 6 

NoYfMiil t-r 

1 

icj 0 

1 , 1 ;■ 4 

: 3 597 

' lie 0 

f'Urou'l<*r 


* 7h 8 , 

J .tno 

1 3, cm 

1 1114 


ir Adjust IOC 

of for It 'Old 

'ij'd then for Sfa 

Nonal vanaiiaii 



Ti‘m 

p r ( (‘ur f'f 

S(-H- 

1 Cyi’hcal-irrt'gular 

Mon lb 


Irfp. j 

»o:\ il 

1 pr'prentages 


\ ,il 1 

< C >' 

adrx 

: CXI 




’ 1 - t 'u 

iS 

! r.M. ^4) fnl (5) 






. (6) 

Kuimiuv 



97 ■) 

s** 0 

111) s 

I'rbruarN 



ill 'i 

im 5 

. 110 2 

Mur.'h 

1 8V» 


130 2 

117 7 

1 115 7 

A prii 

1. 


ll’-« 2 

Mi 3 

109 5 

May 



114 n 

If'O ,5 

; 114 0 

Juno 

3 b'iJ 

a . r.7o 

89 8 

70 9 

( 112 4 

July . 

■> ibj 

3 , r.'^i 

S7 3 

70 2 

1 110 5 

•VtiH’i’t 

3 , it'O 

3 ')S5 

U9 5 

07 ‘J 

i uj y 

u(b 

1 rb** 

.1 r.'"' 

lU 7 

119 2 

’ 112.2 

(h fol-c ( 

s'ts 


130 3 

Mb.) 0 

1 113 6 

Move mb 


8 ,.07 

!19 3 

■t 0 

' no 0 

l)iv (Mubf 


.1 

87 8 

’ .s 

1114 


in. A' 1 lusniH’iit for •.'oirifn!!<‘<l tr^iid and seasonal rfiov(‘iiU’nts 



Ml i^m j1 

I rend j 

Month 

data 1 

. nlu'"' 


T X >1 :< (' X / 1 

r \ 


(2) 1 

' 3 ) 

Januarj 

A , Int) ‘ 

3.r...o ! 

Fk'bru.iry 

3 . 9S.5 i 

.1 .5{4) 

Mari'h 

4.v.> I 

3 . 56 1 

April 

4 4('H ! 

3 , .568 i 

May j 

4,09'l 

;; .572 ■ 

Jurip 

J 3 213 1 

3 576 ! 

•Iwly i 

1 3,133 1 

■5.5S1 ! 

AuhU'*! 

3.9C0 I 

3 , 5S,5 1 

Septeiubor 

! 4.79H 1 

.1 . 5S9 ; 

Drtobyr 

1 4.898 ' 

3 . 593 ; 

N<»vomb(ir 

4 , 299 

3.597 1 

IH'ci^rnber 

I 3,162 i 

3 i.ni ! 


Data frorrt source's pivori l>cU>w Table* 10.3. 




Cs < In'al-irrcKul.'ir 


‘Noriiid iilncs 

pcui'ontagcs 

md< % 

7 X A 

r X I 

S 

Co) (3) X C--! M) 

' Col (2) - Col f.">) 

(41 

(M 

' Oj) 

8,S ») 

3 r.’b 

Uii 8 

19) ■-) 

3 6d. 

i no 3 

117 7 

4 19.5 

; 115 7 

114 3 

4 078 

U)9 5 

100 .5 

3 . ^ 90 

i 114 0 

79 9 

2 , 857 

i 112 5 

79 2 

2 . 8^6 

i no b 

9 

3,r.io 

1 U2 8 

11’ 2 

4,278 

1 U2 2 

’?0 0 

i 4 .312 

1 113.6 

103 0 

1 3,705 

1 U6.0 

7S 8 

i 2 838 

! IU.4 
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to study a series adjusted for seasonal variation as well as to observe the 
eyolioal-irregular movements. Since one rarely is interested in adjusting 
a monthly series for trend alone, the second procedure is not often used. 
If the sole purpose of the analysis is to oi)tain the cyclical-irregular move- 
ments (either as a final objective or as a step toward getting the cyclical 
movements), the third method shown in Table 10.4 will be slightly less 
time-consuming than either of the others, since most types of calculating 
machines can more (piickly perform the series of multiplications which 
replaces one of the two scries of divisions present in the other methods. 

However the cyclical-irregular movements arc obtained, those values 
are often referred to as percentages of ‘'normal.^’ The term “norrnar* 
is frcqiUMitly used in economics, biisines.s psychology, statistics, and in 
other fields, and it is not always iised with the same meaning. In this 
instniico, ‘'normal ” refers to the combined trend and seasonal movements 
of a series, the thouglit being that from a long-ruri point of view it is 
normal for an iiulusiry to iiuTease (or decrease) in some steady fashion, 
and that from a short-run viewpoint it is normal for seasonal variation 
to 1)0 pre.'‘Orit, Taken together, both movements arc ‘'normal/^ 

Smoothing irregular inoveniciUs. The interplay of a multitude 
of forces, other than those already eliminated, is larg(‘ly responsible for 
the irregular movements which are usually to 1)C seen in th(? curve of a 
series adjusted for seasonal variation and tiend. The irregular fluctua- 
tions in magazine advertising linage are apparent in Chart 10.5. Occa- 
sionally, irregular fluctuations may occur ])ecause the seasonal index 
which was used was not as good as might be desired. Earlier consider- 
ation of the seasonal index for magazinii advertising linage has indicated 
that it was sati.^^faidory. 

Irregular fluctuations cannot be completely eliminated from a series 
without the accompanying danger of over-.smoothing. However, the 
irregular movements can be smoothed, so as to bring the cyclical move- 
ments into clearer relief, l)y the use of a short-t(*rm moving average. 
From an examination of (fliart lb. 5 it a.})pears tlmtmost of the irregular 
movements J^re of one month's duration, although occasionally, as in late 
1927 and early 102<S, they appear to last longer than one month. To 
smooth out these m(n'ements, we could use a two-month moving average, 
except that the value.s of such an average should be plotted between each 
pair of months. If we were to average three months, the average would 
appropriately fall opposite the center month, but we would encounter 
another serious predicament: if the first and third montEs were high and 
the second month low, the resulting average would be high; if the first 
and third months were low and the second month high, the average 
would be low., A three-month average wpiild therefore sometimes intro- 
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duce reverse fluctuations into the series. Both of the foregoing diffi- 
culties may be overcome by using a three-month moving average weighted 
I, 2, 1, which is, of course, a centeied two-month moving average. Table 
16.5 indicates how this average is obtaine<l: first a three-month moving 
total weighted 1,2, I is gotten for the cyclical-inegular values, and then 


TABLE 16.5 

Computation oj Cyrlirnl ^iovemrnts for Data of I'nitetl States Magazine 

.iilrertisinfi, 1021 10, >3 



Cycli- 

rnl- 

! 9 liree- i 

Cveh-'^ 

cal 

i 

Cy oli- 

J'hree- 

Cycli- 

cal 




i Uioritb 

1 ■ 

i 



month 


irre-ni- 

pvr- 


)na-gu- 



per- 

Year aiui 

Inr 


' lUOVlUg 

cent- 

Year and 

lar 


moving 

cent- 

month 

])er- 
(‘enl 
age: 
r X 

I 

total 

wcitthlctl 

1, 2, 1 of 
CV)!. (2) 

ages 

C 

CoL (3) 

4 

month 

1 

per- 
cent- 
ages 
a X I 

total 

weighted 

1, 2, 1 of 
Col. (2) 

ages 

C 

Col. (3) 
-r 4 

il) 



; C'l 

(1) 

(!) 

(2) 

" 

(3) 


(■i) 

U)21 



1 


it 152 






January 

1J4 

4 

1 

■ • 


January 

no 

s 

450 

8 

il2.7 

February , 

99 

7 

1 405 0 

101 .4 

F(4'>runry 

no 

3 

117 

J 

111 8 

March . 

91 

8 

j :?()9 8 

92 4 

March 

115 

7 

451 

3 

112 8 

.\piil . - 

8<J 

5 

1 350 7 

89 2 

April 

109 

i\ 

118 

9 

112 2 

May . . 

91. 

9 

; 301 7 

90 1 

May 

1 1 1 

0 

4.50 

1 

112 5 

June 

91 


i 308 1 

<)2 0 

Jun<‘ 

112 

5 

449 

5 

112 4 

July .... 

03 

4 

1 372 3 

f)2 1 

July 

no 

5 

440 

3 

111.0 

August . . . 

91 

1 

1 3t)7 2 

91 8 

.\ugust 

112 

8 

418 

2 

112 0 

SpptPinber. ! 

S5 

4 

! 21-1 U 

! 8(» 0 

SeptfUtsber 

112 

1 

450 

6 

112 0 

October 1 

7S 

5 

i 222 9 

' 80 7 

()ctobrr 

1 13 

t> 

155 

3 

113 8 

November 

SO 

2 

j 220 4 

: 80 1 

XovcMuber 

no 

0 

457 

0 

114.2 

l)ec(‘mber 

SI. 

5 

: 325 2 

83 8 

J )(cemb<’r 

i 11 

4 

454 

4 

113 G 

1922 



1 

1 


1953 






January . . . 

92 

1 

3*52 0 

1 88 2 

January 

i 15 

0 

158 

6 

114.0 

February. , 

89 

9 

i 351 8 

i 88 0 

Fcbrii ary 

1 10 

0 

b’»5 

0 

116.2 

March 

85 

9 

1 340 1 

1 80 5 

>Tan-h 

117 

4 

104 

5 

116 1 

April .... 

S7 

4 

j 353.4 

: SS 4 

.April 

' 113 

7 

: 4t‘)0 

9 

116 7 

May. .. . 

92 

7 

1 370 3 

! 91 1 

May 

’ 122 

1 

; 473 

9 

118 5 

June 

!)4 

5 

j 380 5 

95 1 

Juno . . 

1 no 

0 

i 405 

0 

116.4 

July 

98 

8 

; 394 8 

98 7 

July 

' 111 

5 

! 155 

3 

j 1J3 8 

August . 

102 

7 

1 101 4 

101 1 

.\ugu.'-t 

1 no 

3 

: 458 

0 

! 114 6 

September 

100 

2 

1 400 9 

; 101 7 

Svpteml>or 

' n 1 

5 

i 405 

,0 

i 116 2 

October 

103 

8 

! 409 3 

' 102 3 

Ootobif 

; 119 

7 

: 471 

2 

i 117 8 

November . 

101 

5 

1 115 1 

; 103.8 

NoVember 

, 117 

3 

; 404 

2 

1 110. 0 

December 

- 108 

C» 

131 1 

• 108 5 

December 

1 109 

9 

1 


i 













C'yclicul-jrrogular poreontaKivs from 'i'abl«* Iti -V 


the moving-total values are each divided by 4 to arrive at the moving 
average. Tlic moving totals should be obtained by use of an adding 
machine, each total being obtained separately and not by use of successive 
subtotals as wa.. done when we computed a 13-month wanghtod moving 
total in Table 14.5. The moving averages should be gotten from the 
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moving totals by multiplying by 0.25, rather than by dividing by 4, since 
most calculating machines will produce the results faster when a constant 
multiplier is used. Note that the figures in the second column of Table 
16.5 are the same as those in Column 6 of Table 10.3. In actual practice, 
Columns 3 and 4 of Table 16.5 would be included as additional columns 
of Table 16.3. Two separate tables are shown here because of the diffi- 
culty of showing so large a table on the printed page of this text. Note 
that there will be no tliree-month moving-average figure for the first 
month and the last month of a series. 

The result of snnK)thing tlic t\vcli(*al-irregular values by the use of a 
three-month moving average weighted 1,2, 1 is shown in Chart 10. 

It is clear that this curve is much smoother than the <*urve {)i (Uiart 10.5, 
although there are a few spots where the moving av(^rage was of too short 
duration to smooth out the irregular fhi(‘tiiations completely. Irregulai’ 
movements are not often entirely eliminated from a series. 'Fheir com- 
plete elimination may call for freehand smootliing or u.se (;f a moving 
average of longer duration than three months. In any event , smooth- 
ing process must not hide the turning points of th(‘ cycii(‘al luoveiTHnit.s. 
Since a four-montli moving average would have tlie same ^•'hoi (comijig^ 
as a two-month mosing average, the practhnible nio\'ing lutnage, ru'vt 
longer in duration than the^ one used in Talkie 16.5. would he a (ueiglited) 
five-month moving average Five-month mov'ing-a \'erag<‘ values are 
set opposite the third month of ear’h set of five months 'i‘he montlis an; 
often weighted 1, 2, 4, 2, l,^whicli gives greatest \v(Mght to the ctmt(M‘ 
month and least weight to the end mouths. Since weight pattern 
totals 10, the mfiving average.s may be computed from the moving totals 
without use of a calculating machine 

The irregular nwveinenfs. 'The irregular movements themselvcis may 
be obtained by dividing the cyclical-irregular valiu's shown in ('Column 2 
of Table 16.5, by the cyf lical valia^s, wdiich are in Column 4 of the same 
table. The computation of the irregular movement^ is not sliown, but 
Chart 16.7 shows these, month by moidli, and Chart 16.(S give a fre- 
quency distribution of the irn^gular v.ariatuins. If the irregular move- 
mentvS were of a random chara^uer, they might be expected to form a 
normal curve. Although the curve of (diart Hi. 8 is nearly symmetrical 
(jSi = 0.0005). it is leptoknrtic, having ^2 — 3 00. If the devwxtJon of 
— ITl, not sliown in Chart 16.8, is ineiu(l(‘d in the computations, both 
skewness and leptokurtosis are increased. This is the sort of frequency 
distribution to be expected for the irregular movements of a time series, 
since, in addition to minor fluctuations, there are ordinarily others that 
are episodic in nature and the elYeds of which may continue (or cumulate) 
over several months. The data of magazine advertising are rather ‘^well 
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behaved” in this respect, the deviations continuing on the same side of 
the zero line^ of Chart 10.8 for five months at a time only once, for four 
months at a time only three times, and for three months at a time only 
eight times. 


NUM8CR 
OF MONTHS 



-65 -55 -45 -55 -25 5 -05 ♦OS +15 +2 5 +35 +45 +55 ♦65 


Pfc!^CENTfrG£ DEVIATION 

Chari 16.8. Frt‘«|ut*ncy lUslrthiilion of Irr<‘giilar MovcrneiitH in 
Aihcrlisin^ in the rnile<l Stales, 1921 19.>3. TIk* jiro^^iilar niOveinent>s are 
/ = ('* X / anJ arf' Data computed from 

Columns 2 ami 4 of Jahlo H».5 ami from >\orkslieets iiioi shown) for the years onuttcd 
from that table. 

Comparing cyclical movcmcnls. One refis»)n for wishing to isolate^ 
cyclical movements in a time series i.s the dc.sire to compare them with the 
cyclical movements in oni* or more other series. Occasionally it may he 
thought that one serie‘;s more or less consistently [)rece(les another at its 
cy(;lical turning points. However, when two series difTcr in regard to 
the. amplitude of their fluctuations, .some difficulty is experienced in com- 
paring the timing of those fluctuations, fl'he more marked the difference 
in amplitudes, the more important it is to make some sort of an adjust- 
ment for that dilference. 

As an illustration we sliall irse the Index of Durable Manufactures and 
the Index of Xondurable Manufactures for January 1940 -December 
1953, both of which are issued by the Board of (Governors of the Federal 

^ Thi*^ is not ^asy to from the chart. The* counts were made from the data upon 
which the chart is ba.''Cd. 

HiCad-lag relationships are discusseri in riiapU'r 22. 
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DATA FOR INDEX NUMBERS 

Although the method of combining the variables is of considerable 
importance in constructing index numbers, it is insignificant when com- 
pared with the problem of selecting the data that are the raw materials 
of the index. Too much emphasis cannot be put upon this point. I'he 
data must be accurate and homogeneous, and the sample representative. 
A sample cannot be expected to be representative unless an adc^quate 
number of items is included. To state the idea in other language: a 
sufficiently large sample of relevant items must be selected to obtain 
reliable index numbers. 

As noted before, the commodities to be chosen for a price index, and 
the type of quotation to be seletded, depend on what is being measured. 
A wholesale price index requires wholesale prl ,/s. An index of prices 
paid by consumers necessitates not only retail prices of ff>0(l, but rents, 
gas and 'electric rates, clothing prices, transportation, medical care, and 
so forth, applying to the class of persons for whom the cost of living is to 
he ascertained. An index of the changing (*ost of constructing frame 
houses in Atlanta, (le^rgia, should include those materials and items of 
labor that are used in frame houses built in .\Uania. The prices should 
be the Atlanta prices of those matfxials and the wages should he the 
wages in Atlanta of the kind of laljor used. Thes:e (‘xarnples indi(‘ate 
one reason wh}" it is important to bear in mind al all times the purpose 
for which the index is b(nng compiled. The purpose of the index and just 
what it seeks to measure will also infiuciH'e the selection of the base, the 
weights used, and rlie formula (‘inpl r'c,d. 

When selecting liie sources of ilata for indoK nu.rbers, ^\e may rely on 
r(‘gularly published (pjotalions or obtain periodic spe(Mal reports from the 
merchants, producers, exporters, or others who possess the basic informa- 
tion needed. Under either circumstame, we must mak(^ sure that the 
data pertain strictly to the thing being measured, d'hus, if retail food 
pri(;e elianges are being measured, (flotations should l)e from super- 
markets, chain stun's, indepemdent stores, and any other inij)ortaiit 
outlets. These different sources should not be mixed indiscriminately, 
hut should be appropriately weighted wlum Mmihined. Neither should 
firs t-of-th e-month (luotations, middle-of-lhc-month qn .stations, and end- 
of-the-month quotations ordinarily ^ combined in one index. 

The discussion immediately following is in part an application of prin- 
ciples discussed in earlier chapters of this book, especially (Chapter 2. 
The great importance of the proper choice of data for index numbers 
justifies a bringing together of these principles, e\ ea though some dup)’ 
cation is involved. 



402 


INDEX NUMBER CONSTRUCTION 


[Chap. 17 


Accuracy. Some statistical data that appear in precise printed form 
cannot be depended upon. If the person or company reporting the data 
uses the data for operational or tax purposes, they are likely to be accu- 
rate; but if the data are merely statistical reports furnislied to an outside 
agency, they may be compiled originally by careless and indiffenmt clerks 
whose sole interest is in tilling the form with ink marks as quickly as 
possible. It therefore behoov'es the statist ieian to ascertain how the data 
are collected, and to select his source with <liscrimination. 

Coniparahility. Standard grades of the same commodity are, of 
course, comparable between differtait dates; however, a 101 i automobile 
cannot be compared with a present-day automobile. Nor could the price 
of a '^standard” automobile be comt)Utcd for ditferent years, since in not 
more than one year could siK'li a standard automobih' ordinarily bo found. 
1*1 the case of highly manufactured goods, which are furtlua* developed 
over the years, the upward bias of price (pjotatioiis is gn'atest; but it L 
present, also, in the case even of some agricultural commodities, since 
their production, als(j. involves imne processing in later than in earlier 
years. It is likely, therefore, that must price index numbers have an 
upward bias. 

A wsirnilar })roblem aritot's when one arti^‘le j)asses out of wide use and its 
place is taken by a diiTerent commodity serving somewhat the same pur- 
pose. For instaiK'c, the stagecoa(-h of 100 years ago has ])een siipe.^seded 
by the streamlined air-conditioned train, the pressurized phuie, and tlie 
de luxe bus. If we should tind that the fare from Wasliington, I), (b, to 
Philadelphia wan;e the sanm in tin; two periods, we should not conclude 
that the cost of the same ser\iee had remained (be same, because the 
service, too. has changed. F(‘ss time is reijiiired to mak(‘ the trip and it 
is now made in mu^ b, greater comfort. 

Representaliv eiirss. [^Since ind(*x numbers ai’e usually obtained from 
sam])les, we must try to obtain a sanqile that behaves like the population 
from whicli it i-’ drawn. Probably the most satisfactory way of accom- 
plishing this is to divider the original data into groups and subgroups 
and to draw a rejiresenl !iti\ a; sample from ea(“h of tlie.se. Sti’atitieation 
into groups and sui‘gr(Mq)s is eiiqdoyed because the various groups and 
subgroups of commodities, affected by (lifb.rent economic factors, may 
be expc('ted to di-play patterns of behavior which are distinctive to each 
group and dirbu'ent from oUuu’ groups and from the over-all index. 
For examj)l(\, if an index of wholesale pric‘es is btang made, we should 
expect price (or (pnuitily) movements of foods to b(» diiTerent from those 
of building materials. One reason for this is that th(i demand for food 
products is imOastic, wdiile that for building materials (Avhi(‘h are durable 
goods, the tiurdiasc of which can be t)Ostpoiiedj is clastic. Furthermore, 
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the supply of foods, over short periods of time, is depcnd(;nt to a eon- 
siderable extent on the weather, Avhile the suppl}^ of building materials 
is suhje(d. to eonseious control of the fabricators. 

Ill choosing the commodities from a group, it is desirable to pick ones 
which tend to conform most (doscly to the central te/idency of the group, 
if that central tendency can be determined. Having selected commodi- 
ties that are reasonably representative of the group from Avliich they were 
picked, it is desirable to ascertain whether proportionate representation 
has been obtained for each group. If, upon the basis of doilnr value, tlie 
sample for one group (or groups) constitutes too small or too large a pro- 
portion of the entire group, commodities may be adfled to or dropped 
from the group sample. When sucli an adjustment is not feasible (for 
example, if the group were ''strindurcd steel” acd the sample constituted 
100 per cent of the group), an altcrnati\'e consi.-ts of a])])Jying a.])])roprjate 
weights 

A iiirfher test of the re])resent.ativenes^ of the sample ean sonndimes 
be employed: Do the value* changes of the sample (‘oincide with liiose of 
the populatiofi? d'liis test sliould be applied not only to the whole 
sample, but to tlie \'arioiis groups and subgroup.^ into which it is <Iivided.^ 

Ade<iiia('>. jj^ln Cliajiter 21 it will Ik shown that the I'ldiability of the 
arithmetic mean of a I’andom sam]>l(i is direelly nMated to the square* root 
of the nurnber of items included. Furthermore, in a iiiiite population, 
the larger the 'proportion of it(‘ms included in the sample (see App(*ndix S, 
Section 21.2), the more reliable is the mean of the sample. The al>solute 
number of items to use (*annot be so. ted in pna*] \ and fixed terms. As 
just noted, commodities ^items) are ordinarily solvated from the various 
component groups, so that the sample is a stratilied one ratlau* than a 
random one. Furthermon?, in selecting the items from the groups, the 
more important items are ordinarily chos(*n lirst. afft*i wliicli as many 
suitable items are included as resources will permit. Thus, the items are 
not taken at random within each stratum. As a i-(‘sult of these two situ- 
ations, ordinary reliability formulas are not applicable.^ 

For the index-number illustrations used in the remainder of this 
chapter, five citrus fruits have been selected: grapefruit lemons, and three 
categories of oranges. For cacli of hese, exc(*pt grapefruit, the produc- 
tion figures refer to total production. For grapefruit, the production is 
for Florida grapefruit only. The prices for all five fruits are the auction 

2 This test is similar to Irving Fisher’s ‘‘total value criterion.” which states that the 
price index multiplied by the quantity index should equal the ratio of change of the 
total value of the population. See Irviug Fisher, “The Total Value Criterion,” 
Jourml of the Am^. ican Statistical Associatior, Vol. XXll, December 1927, pp 
419-441. 



404 


INDEX NUMBER CONSTRUCTION 


[Chap. 17 


prices per box on the New York market* The use of these figures involves 
some artificiality, first because the total production was used, including 
not only “production having value, “ but also fruit consumed on the farm, 
donated to charity, or unharvested or not utilized on account of economic 
conditions, as well as fruit used for juice, concentrates, and so on; second, 
the price quotation is the average per box for the seavson at just one 
market and does not take account of prices at the other nine auction 
markets in the United States, except as they are reflected in the New 
York market. For these reasons, the various indexes computed in the 
following pages of this chapter must be eonsich'red merely as illustrations 
of the behavior of the various formulas and weighting schemes which are 
dis(‘Ussed. 

The season for each fruit begins with the bloom of on(* year and (*nds 
with the completion of the harvest the following year. As ( xplained 
below Table 17.2, “1053“ indicates the crop year 1952 1053, and 
similarly for other years. The fruits used for the calculations which 
follow, their seasons, and the weight per box are: 


Frutf 

Sr n son 

Ntl 

rontents 

Grapefruit, Florula 

Sr-pi. 1 to Julv dl 

per box 
80 pounds 

Lemons, California 

\ov. I to Oct. .'U 

70 pounds 

Oranges, Florida 

0<*t 1 to July '51 

00 pounds 

Orange.s, C’alifornia. 
both varif^ties ^ 

Oct. 1 to n<‘C. dl of 



following >ear 

77 pounds 

l^Regardless of the 

SELECTION OF BASE 

formula employed for weighting and 

combining the 


data, it is cii.st(anary (although not necessary) to selei't some period of 
time as 100 j)cr cent with which to compare the other index numb(jrs. A 
month is ordinarily too short a pcadod to use as base period, since any one 
month is likely to be unusual on account of accidental or .st^asonal influ- 
enc(^s. A year is sometimes used. However, it is often true that no one 
year is s\ifficiently “norinar’ to be a good basis of comparison. Business 
and prices are always advancing or recoding with the business cycle. 
Though not so specific, an average of several years is u‘ jally a better 
base. The period 1910 through 1914 has sometimes been used as a price 
base, wliile the 1923-1925 average has been UvSed for (luantity indexes. 
In the past two decades, the statistical agencies of the United States 
Govenjrnent have successively shifted to several other bases: for example, 
1920, 1935 1939, 1917' 1949, and special-purpose ones, such as September 
1, 1939 and June 1950. A useful solution is to employ the period of years 
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that is used by some of the other indexes with which the one being con- 
structed is likely to be employed. 

Although a particular base may be satisfactory for a number of years, 
that base becomes less meaningful as time passes, and it eventually 
becomes desirable to shift to a more re(*ent period. Among the reasons 
are: (1) the dispersion of price relatives may become so great that no 
average is reliable; (2) because of permanent currency depreciation, 
growth of population, technological developments, and other reasons, 
new’ and higher levels may have been attained by income, prices, produc- 
tion, and consumption; (3) the pattern of consumption may change to 
such an extent that no aggregate of commodities can be found which 
in<‘ludes the major expenditures common to both periods: (4) the cpiality 
of many commodities, nominally the same, changes progre.ssively wuth 
time. An indirect basis of comparison may bo had by utilizing a chain 
index system, Avhich involves, essentially, the comparison of each year (or 
sul th(‘reof) with the preceding year. This method, which is not 

completely satisfactory, is explained in the folhnving chapter^ 

AGGREGATIVE FRIGE INDEX NUMBERS 

It has already been stated that there are tw'o methods of constructing 
imlex numbers: (1) hy computing aggregate values; (2) by averaging rela- 
tives. By the first method, as will be explained in this section, the prices 
or quantities are made coniparable, arc automatically weighted by being 
reduced to dollar values, and tiieti arc combined into aggregate values. In 
the following section the method of averaging rdatives will be explained, 
rhere it w’ill be showui that the tw o methods are, i nder certain conditions, 
nerely alternative methods of obtaining the san.c result. The aggrega- 
tive method obtains the result directly, and produces a result that has a 
simple and clear meaning; the method employing relatives is more 
voundabout, and its meaning is more technical. Nevertheless, there are 
situations in which the aggregative method is not applicable, and recourse 
must then be had to the a\'eraging of relatives. 

Simple aggregates. Table 17.2 illustrates the construction of a 
simple aggregative price index. The prices of each commodity in any 
giv^en year are merely added together to give the index number for that 
year. It is then frequently convenient to designate .ome year as a base, 
which is set equal to 100. In this illustration all of the index numbers are 
expressed in the final row as a percentage of the 1948 number, found by 
dividing each one of the numbers by the value in the base period ($23.01 ) 
and multiplying by 100. 

It must be apparent that the influence which a commodity exerts on a 
simple aggregate index depends on the price per unit of quotation. Jn 



406 


INDEX NUMBER CONSTRUCTION 


[Chap. 17 


this instance, the predominant item was lemons; if grapefruit or Florida 
oranges had been quoted at wholesale hy the carload instead of by the 
box, they would largely have determined the course of the index. The 
weighting of an aggregatix^e index by one comrnenaal unit of each com- 
modity represented, then, is illogical m that it neglects to consider the 
actual im'pDrtance of the different commoiiities; it is haphazaril in that 
the relative influence of the different commodities is determined by fa(*tor.s 
(jiiite irrelevant to the purpose of the price index. 'Tlie probhun would 
in no s(‘nse be sob. ed if all commodities wore reduced to a price per 
pound, for some commodities, such as diamonds, are very costly per 
pound and yet are not very important iri our economi(^ life, while coal, 
which is of tremendous importance, ivS relalively cheap per pound. 
Furthermore, some goods: such as elefUrie power or liiiman labor, cannot 
be reduced to a pound Still another solution is to fake as the unit 

of cpiolation the amount that (‘an be purchased for one dollar in the base 
year. Hiit this is s(*arcely more logical, since it would be very uruisual 
if the same amount of money were spent on each (a'>mniodity in every 
y(Mr, 

Before (‘onsideration of the construction of weighted aggregative index 
numbers, it may be helpful to stale symbolically the method we have just 
Used. The formula is 


where P meaui^ price index, p refers to the pric^ of ati indix idual com- 
modity. the subscript o rebns to the l>ase period, from which price changes 
are m(*asur(‘d, and the siibscrijff. n reffers to the given period wliitjh is 
being compared with the base. Now if the formula fur a })ar1icu]ar year 
(say 1953, with 19 IS being the base,) is to be statc^l, It <*ould he written 


Weighted aggregale*«* In order to aHj>w each commodity to have a 
reasonnldc iiifhcMice on the index, it is advisable to use a deliberately 
weighted ratlier than a simple aggregate of prices,, which, as wc have seen, 
involves concealed weighting. To eonsinnrt a weighted aggre.gativc 
indfjx, a list of definite quantities of specifuid (a)mm(j< lilies is ta.lcen, and 
calculations are made to determine what this aggn’gaU^ of goods i.^ W'orth 
each year at current prices. ()i)viinisly the procc^ss is merely that of 
multiplying each unit j)nce by th(» riurnlHT of units and summing the 
resulting values for each ptuiod. The procedure, using the (|uantities 
produced in 1918 as multipliers, is illustrated in 'bahlc 17.3. The reader, 
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havinj5 followed the roasonirif' to this point, will roalize now that oggrega- 
live index numbers of price measure (hr changing value of a fixed aggregate of 
goods. Sirtre the total cost or v^alne ehaiij^es while the com]>onents of the 
do not, theses chan^c^^ must he duo to price changes. It appears 

TVBf.K 17.2 

Construction of Sinipie .'if(iirrf:atiic Index \ittnbers of Citrus Fruit Prices, 

19 0$ P)r>:v 


; Pi k i's j<’ •* pi r hii'- ) 

Fruit ^ ' l'M 8 JOPJ ' lO.vt ]'J 51 ^ 1952 ; 195 a 


( tiap(‘fruil . 

.>5 

:](}■ 8 1 

( 10 .85 

32 

S4 

31 S 1' 

o[i .St 

4(> 

1 . croon 3 

() 

s2' 7 

85 7 

70 


45 7 

K1 7 

(H 

Oranir;cs, Flornla 

: 3 

ih 4 

.3 .S' 5 

00 

[ 

15 3 

Si: 4 

3f> 

Oranges, C’ahfonii.’i, Xa\r! 

' 5 

]ir 5 

'.2 5 

2.^, 

5 

77 7 

05; 5 

33 

Oranges, Calitur nia. \Xl<'neia 

1 

;;2 5 

3 1 5 

12' 

5 

7)1 ) 5 

58‘ 5 

77 

Aggregate 


01 >2S 

"li; S2S 

37 

S27 

IS S2H 

29 S27 

'47 

L,J. V .M"‘h<'r (piT cent rd' MOSi 

'ino 

(1 122 

t 123 


1 19 

\ 122 

9 'U9 

4 


♦ Thi‘ crop Miir UM? I'OS w cii's-yiin;* il IwtS, an'i -’n.ii-*;!. for i tlu^r \i‘a;s !?!r»ro most har\ft'Jt)iiiK 
ami ( ofistHvmiiT'N thi^ irail otirijr o* ii--* ir. rho iuU r m-ui 

Data from 0 lii'i'iH tn'* or of Avr’-n’Mii 'U" 4(/ t u'Ui<{A p )7'V ooii IPiroau of 

AKrii'’iU'iral I'-csjiiouiirv ^'lop Ho-m'I, 'ri. ‘ •''itrus K’-iiit«, P/o- 

iliirtioM, Parni I)i? \ UJUl rtil'/ation of SaU -i, < lop Si I'Joi ow* ari,i iVoJ T/i.” 


T\BLi: 17.3 


Construction of iaureuutiro fn4lex \utnlfers of Citrus Fruit Prit^es^ lOtH- 
lO.iiC H eiiifitcfl h\ /^nn/urtion in I9t,H* 


iMi'^? Oi rluvi- nol- nl I i>\r- \‘i] i< ^ 'ii ^(lO'iNaii l^ ot Solia'*- 




rJrapofruit 

Doition.^ 

Or.'injros, l‘'lo?i(i.i 
Or.'iD^^os, (’alifoniin , N:i\ol 
Orangos, CMl'f')Tnia, Walom ia 
AssretiJiio \m 1 \io 
Indox nuiulior (por cont of 
104S) 


\ p.iiu* </«' iPl'^ip] ’Us ‘ii [D of ^ oar 

[jfd- 

futs ; I'.Mu luno ip:,i ’ iurvj 105.‘; 

.>3 0«)0 IDS, poo i.V.MlDU 175,5'i() i IJ.Joi' ! ->2 , o'.fO }4.'> , JOD 
Il’,s7() sr . ;7.vlDl ,(K>D MD.uUM 077 , ss‘J 1 00 , 001 ' 07,011 

AS . K)0 1 1)0 . I 1 1 255 . 702 202 ,000 250 . ,ns<J 222 . 5(M 25 1 024 

t s . 0(M »' 07 52 I 1 25 , 1 1 S OS S rr 1 00 ,052. 1 ;i3 . 24 5 1 00 . 737 

20. 0?0 I It; :).1S l li.OOS 1.37. ssj l is, 1151 50 2t>o' 155,3St) 

t.OO (.70 750 , 03S , .3SS'755 ^ foo 7.30 . J iO'trsTr .S8S 

■ utn 0 121 2 1,31 S 12.3 0 121 3 i 123 7 


* Sri' to Taljlc 17 2 1 oTii’i-MKOji riop vi-u-- 

Pass'd on |w ilaf a III r.ibli* 17 2 and [iioiim fioii liom \l-/ n SVij/i v'il p. 103. 


that this tyj)(* of index nuniher up ,v'>ures the very thing sought if we wish 
to determine changi's in tlio cost of living, that is, (he cost of a fixed 
“market basket’’ of goods and services. The geiHunl formula for the 
aggregative j)ri(*(^ index is 
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The symbols are those used earlier, but a new one has been added: q refers 
to the quantity of the commodity produced, marketed, or consumed (that 
is, the quantity weight, or multiplier). Since the index nutnbcrs con- 
structed in Table 17.3 were weighted by base-year quantities, we may 
write the formula more specifically 



Comparing 'Jables 17.2 and 17 3, it will be seen that, in the simple 
aggregative inricx, lemons were of greatest irnporiance because they had 
the highest price per box: but, when base-year cpiantity weights were 
introduced, Florida oranges became most important. 

Selection of weights. Although in the preceding illustration 194S 
([uantities were used as weights, this simple procedure is but one of several 
possible systems. It would have been just as easy to liave taken, say, 
197)3 quantities as weights. If the quantity of each commodity marketed 
changed from year to year in the same proportifjri, it would make no 
difference to what period the weights referred, for the results would be 
i<lentical. In fact, however, the relative importance of the different 
commodities is constantly changing, and this is due in part to the change 
in the relative prices of the different commodities, whi(‘h in turn result 
from changf's in supply and demand. Therein lies a great source of 
difficulty for which there is no completely satisfactory solution. The 
answer depend.s in part on what the analyst thinks a price index is sup- 
posed to do. 

One view is that such an index iiurnbtT measures the changing cost of a 
constant aggregate of goods. Another view concerns itself not with the 
goods level of analysis, btit with the sati.sfactions level; an index mimber, 
according to this view, should measure the changing cost, oi aggregate's of 
goods yielding the same utility or satisfaction at two periods, or two 
places. Thus, suppose we compare th(^ cost of living of two groups of 
similar persons at two periods (or places), these* groups having at the two 
periods (or places) the same tastes and capacity for enjoyment, as well as 
an Income that will purchase, and does purchase, the same amount of 
satisfaction.’ The commodities, of course, will be different, but if the 
expenditures were $4,000 the first year and $4,800 the second year, we 
may conclude that the cost of living has gone up 20 per cent. It goes 
without saving that no one has accurately made a measurement of this 
kind. Although it seems feasible to measure only the varying value of a 

’Sec J. M. Keynes, A Trentiup on Money, Vol. T, pp. 06-99. Harcourt, Brace, k 
Co., New York, 1930. 
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fixed aggregate of goods, yet the analyst should select a list of goods that 
will avoid the certainty of bias in a known direction with respect to the 
cost of obtaining equal satisfactions at different times. The following 
suggestions have beijn made for solving this knotty problem, 

J[l. Use base-period quaniities as weights. This is the method we lia\ e 
used for illustrative purposes in Table 17.3. However, even if there has 
been no change in the tastes or environment of purchasers between the 
two periods, purchases of those commodities that have increased rela- 
tively in price will decline relatively, and purchases of commodities that 
have decieasod relatively in price' "will increase relatively. It is entirely 
possible that tliis type Of iiulex might record an increase in the price level, 
whereas by increasing the relative amounts purchased of commodities 
that decline in price, the same amount of satisfaction might actually be 
l>ought by a givini individual at- a lower total cost. This type of index, 
then, lias in a sense an upward bias. It might l)e said that this index 
uuxilva ui upper Jimit^to the price (diange. This method is sonaTimes 
known as JMepcyrei^Mihhodj iyid^'as^'previously stated, can be defined 
symbolically, ■ 

1^2. f V given-period gnantifies. THhat is, use the weights that pertain 
to the year which is to l'»e compared with the base period. This method 
involves the selection of a new set of weights each va'ar. or even more 
often. But fre(iuently it is inqiossilde to obtain current (luantify weights, 
and, even if they are available, the labor of computation is appro.ximatoly 
doubled. Furili('rmore, although each period is thereby directly com- 
panil)le with the base year, the comparison of the diiTerent years among 
themst'lves is not valid, for tlie reason that the aggrcgat(3 of goods differs 
(‘ach year. 

If we Blink of 194S as being the base period for an index of consumers' 
[irices, the basc^-year weighting s\>tem answers the ([uestiun: If it cost 
nu' Slot) a month to live in I94S, how much would it cost me Bus year to 
live the way I did that year? The given-year W'eighting system answ^ers 
a different (puNstion: If I could have supported my present scale of living 
in 11)48 with $100 per month, how much must I spend this year? A 
theoretical objection to asking such a question is that undue weight is 
given to the commodities that have declined in price. It is the relative 
decline in price that may be responsible for their increased purchase, and, 
although it is price change wdiich w^e arc trying to measure, yet our 
weighting is partly determined by relative price changes. Thus this 
method may be said to have a downward bias, and marks the lower limit 
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of price change. It is sometimes knoAvii as Paasche's method and has the 
following formula: 

p = 1 

1^3. Use the average (or total) quantities of base and given years. Tins is 
a compromise solution, although it is one which has no general bias in 
any known direction. But again, as in method 2. we liav^e shifting 
weights and a resulting lack of comparability among the difl'erent years. 
The method was proposed independtnitly by the Kjiglish economists 
^Marshall and Edgeworth, and the formula 

P ^ ± qn) 

^PoiQo + 7ri) 

is sometimes called the M ur shall- Edgeworth formula.^ 

4 4. Average togvthf r the quantities for all the years which the index numbers 
include. Though perhaps an excellent solution for a bistoricnl study, this 
plan is impracticable if the index is to be kept up (o date, siiu'O it means 
current revipiou of weights and epntiniious recomputation of tlu' complete 

set of index iiumbc' 

1^5. Average toge(h<r the quantities of several years which art thought to be 
typical. This again is a compromise solution, but it is practical and is 
very frecpiently adopted. The list of cpiantities used will, however, even- 
tually become obsolete. When that is tlie case, a ruuv index can be con- 
structed and spliced to the old one. IMethods for so doing will be con- 
sidered in the following chapter. "I'he construction of an index number 
of 1953 citrus fruit prices, using as weights the averugr* ^juautities for 1918, 
1949, and 1950, is illustrated in Table 17.1. 'i'he indf^x number v^aries 
only fiv'c-tenths from that employing base-year winglils. The formula 
for this particular index number may be wrilteu 

S7y4874s-oo 

Of course, the results are the same whether average-f plant ity or total- 
(piantity weiglits are used.^ 

L.6. Determine the highest common factor. The weights are tlie (piantitics 
of each eommodit y common to each year, either to the base and the given 
year, or to all the years under comparison. In the latter ease, this would 
mean that, for any commodity, the smallest amount marketed in any of 
the years under comparison would be taken. Usually, then, the quan- 
tities of the different commodities ti.ken would not caeli bo for the same 
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year. This ingenious device has been suggested by J. M. Keynes^ to 
avoid the sort of bias inherent in methods I and 2, already described. Its 
virtue is its modesty: the device avoids tiying that which cannot be done 
perfectly. However, if the values of quarititics that are common vO the 
different periods are small compared with total expenditures, or if they 
constitute in different periods a varying proportion of the total, or if the 
satisfaction derived from this aggregate of goods varies, the method is no 
more accurate and, (juile likely, is less accurate than method 5.^ 

TAnu: 17.4 


Construction of 19^3 Af*gret(ative Index \umher of dints Fruit Prices^ 
U'eif^hted by Production^ in I9t8, 1949. and 1930 


(Vn)t\ li'tiosi in ( lion.'.arul-i of in fiious'P.nda of ) 


; I Total i i * V.'ilno of 1048 - 

; pioductioij 1 I j rii.eper j IdoO -ivora<?e 

j I i . ‘ 1 i.rodut.Tinn at 

i I turn I ' t ! price in 

j ^ ... 

I HUS I 1C40 I 1950 ; 1959 j [ HMS 1953 ; 19 48 ; 1953 

Grapefruit '33 >*09 3(», 200!31 , JOO, 87 , lOoLnM30,T!;3 -it)! 90 j 29 i;: 87172 

l-Munnj., ;12. 879,10 OloUj.dOO' 34 , 2-10' 11 . 4 19 0 S2| 7 lU ) 77.8H5' SR, 830 

Ojar\»n“^. Klonda '.iS , 400j:>S .300;58 509' . 200,5S , lOO! 3 41' 4 30 HJ2 U4'2r)4.r>24 

tirHUKc-. (Vilifornia, Navel ii8 /lOO 1 1 ,i*-ioji5 , R3(t' Ml . 44075 , 4S0j 5 10 :> 33l 79 8771 82,508 

Ojrtnjre^, Cahloima \Hl<fKU\ l20, 930 25 . UK)'20 }30' Vh j00'2u OOOl 4 .321 ,5 77 1 12 , 799' 1 50 , 539 

Af'^reKute value j ... I i . j .. 1 ' . 1.505 , 07 ')l762 . h73 

Index nuMilM-T (per e*-rM oi 194S,i . . , j . I 1 j ■ j j 100 0 i 121 2 

* The iudo.v nuuiOer is the samn \^lu*tJior thr \^eij,:)it3 u.scd are total or average produrtion for the 
three years. See note lo Table J7.2 cotimning oiop years. 

Data from sources giNcn Ixdovs Table 17.3 and from A(/rirufii4''al /C5<>, p. 193, and 155/. p, 

178, 


7. Make two index n\nnhcrs, each v'ifh a dijfc'fnl set of weifjhfs, and 
average fhe hoo together, usuaJh/ geomefricallg. The iwo systems of weight- 
ing chosen are onlinarily base- and giv(Mi-yej r weights. The formula 
then becomes 


P 


I ^Pnfl. 

:^Voqn 


It is frequently called Fislier's ‘'idlhal ” intlex number, because it conforms 
to certain tests of eonsistent behavior whi(‘li Irving Fisher (‘onsidered 
appropriate.^ On the other liand, it is difficult to say pr(T*ise]y just what 
such an index number does measur' 

A general criticism of any weighting system which involves the use of a 
different set of weights for eacli index number is that, although each index 
numlicr may validly be compared with tliab of the base year, logically the 


^ Ihul, pp. 105-109. 

®See Irving FishfT. The Making of Imhx Numbers, Houghton Mifflin Company, 
Boston, 1927, p. 220. In Cbaptor IV Professor Fisher discusses these tests. 
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index numbers of no other two years (such as 1952 and 1958) can* be com- 
pared with each other. This criticism applies to given-year weights, to 
the average of base- and given-year weights, to the highest-commoii- 
factor method when the (plan titles selected are common only to the two 
years being compared, and to the ‘'ideal” index number. It does not 
apply to base-year weights, av(n‘age weights of all years, typical weights, 
or the highest-C(3mmon-‘factor method when the (juantities common to 
all years are used.^ 

Although the theory of weight selection is interesting and invcilvos logi- 
cal analysis of a high ordtM*, it i.s easy to overestimate its pr‘U‘ti(‘al impor- 
tance. Consider the following results obtained from the citrus fruit data: 

ttm 

iruiv’X number 

iir< -1 
12.-i 7 
121 2 
12 i 5 
121 I 

In this case there is a very great ditTc'rence betwv'en tiu' simple and the 
weighted index munliers, bail littli^ ditTert‘nce betn(‘eii (lie systruns of 
weighting, 'bhe ditTerenl weight systems substantially agr(‘f‘ because the 
importance of the weights n'lative to each other was alanii the same in 
the four s\>terns. If, however, both the prices and (juantities had varical 
greatly in their relative magnitude*, the dilT(T(‘nt wcaglitings rniglit have 
given mark(?dly difT(TenT results If all })rict‘s moved in the sam(‘ direc- 
tion and changefl at the same ratio, it w<juld makt* no dirbu'cnce w'hat 
sy.stem of weighting W(*re ('hosen. Hut if il happcais that commodities 
which arc changing in relati\(‘ importance during the period an' 

also undergoing price cinuiges materially difUTcnt from th(' averagt', 1h(*n 
the matter of weighting be(‘(>mes important. It i.s u.-ually of slight 
importance whetln^r (*\act w(*ighls are us(‘d, nv only approximate^ wc'ighls. 
Thus, Tabl(» 17 5 is e.xactly like d’able 17 4 exct^pl that tlu^ (p.iantily 
W'eights are rounded to one digit, but. tlu' ivsults vary by only 0.27). 4'h(* 

explanation is tliat the rounding did lujt appreciably chang(^ the relatives 
importance of the weights. For all practical })urpos(\s, sufficiently 
accurate results will usually be ubtain(*d if (‘xact weights arc given to 
the few more important commodities, and rounded weights to the 
numerous unimportant c()mrn(>dilies.® 

•Irving Fisher rorornrnejuls th}«t, the nuantilics bn rounded to 1. 10, 100, or 1,000. 
This, of coiirs<*, inatcrKillv liglitens the work. In rounding any quantity between 1 
and 10 (for instance;, the dividing point is not the arithmetic; mean of these two num- 
bers. V)ut the geometric mean, .3.102:1, sinct; this involves th(^ smallest relative error. 
See ibid., pp. 346 anti 432, 


Sijstf tn of wtifjhtinq 
Simple aggregative 

1918 (juantity weights (briM'-year weights'- 
19-18- 1 950 average (]uantit\ weighl.s 
1953 (|uantity wcughts (given-year weiglil.^:) 
‘Tdear’ iud(;.\ number , 
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Although only approximatr arcuracy is necessary in choosing weights, 
acc^uracy in price (juotations is, in practice, of much greater iiri])ortance. 
This, of course, results from the fact that some prif'es are apt to show- 
marked changes from year to year, while others cliange littliJ. This is 
the same as saying that the ratio of ttu* prices to each other changes from 
year to year. 

Over a number of years, varif.us changes take place: commodities shift 
considerably in their relative importance; old commodities disapi)ear from 
use and are* succeeded by ne w eomiru>ditlcs . inorlels, styles, or grades of a 
(aanmodity l)ei‘ome obsolne aiid cease to be manufactured, with new^ 

r\HM: J7..> 

('.iinstriivt ittn of Int!cx ^Surnhi^r itf Citrus Fruit Prices, 

II hy .ivera^e Pnuhirtioii' in ami I0.*0 Hounded One 

l^iiiit 

' IN (iwii Ml f ot ^ a! . mi ihousjinilii of !Uir<i I 


I' i )iit 

predm 

lion 

I^MS- 
IdoU 
'O'Mlnlf (1 

Pnee 

19 IS 

por box 

195.{ 

Vaiae of 1918-1950 
average production 
at pnee in 

1918 1953 


^0 00(1 I 

:^o 

■ 84 40 

99 000 i 

IM.OOO 

bcnion.^ 

10,000 

() 82 

7 c,i 

' (>8 200 . 

70.100 

Orongos. I'lot iil;i 

()0 00() ! 

8 n 

4 M) 

201 (4)0 i 

2t)1.000 

Oranges, (. 'alinirrii.-i. N n"! 

20,000 ; 

5 U) 

r> 33 

10.3,200 

. lOlVOOO 

Ornngc's, ( ’alilorniM \ 

:io.ooo 1 

4 32 

f) 77 


17.?, 100 

Aggrcg.'itc \n)ije 

intlex nuniif'!' qu-r cent nf I’.OS; 

1 

1 


1 

' 0Ol,lU)() j 

i 100 0 ! 

; 740.400 

12;? 9,“j V' 


*•' See noU* to 17 J t'un.'to mrii' i rop \»oiis 

Djifj. from MmiM'f'M Tal)l'' l/,4. 


models, stylos, or grades taking their plac(*; marketing <‘enters shift, so 
that a price (luotation at tiio new’ center must replace that at the old; 
f.o.b. price (luotations may give way to delivered prices, or viee versa. 
Under any of these circumstan(‘cs it may be desirable to express eaeh 
index number, not. as a peicentago of tin* original base, but as a percentage 
of the preceding period. Such an index migiit employ any of the for- 
mulas given above, utilizing w(‘ights pertaining to either or both of the 
years or months being compared. Fivvjueiitiy these separate percentages 
are chained back to the original base by a process of successive multiplica- 
tion. Such an index, known as a chaiti index, wall be further described 
in the following chaptcM*. When snl>stituting one commodity for another, 
or \vhen changing woiglits, overlapping data are needed for only a single 
period, as a direct 'ornparison is made only Ixdw’een the prices (or quan- 
tities) of the current period and th(vsc of the preceding period. 
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AVERAGES OF PRICE RELATIVES 


Two basi<? steps are involved in constructing indexes by averaging price 
relatives. 

1. Convert the actual prices for each scries to percentages of the base 
period. These percentages are called pric'c relatives, since they are 
expressed, not in dollars and cents, but as percentages relative to the 
price in the base period. The upper part of 'Table 17. (> shows the price 


TABLE 17.6 

Construction of Index i\unibers of Citrus Fruit Prit'os^ 1918-19.53^^ by Use of 
Simple irilhtuelic Menu of Price Helnlives 


Fruit 1918 

Grapefruit UK) 0 

iTcmoas ... 10(t 0 

Orange"^, Floinla 100 0 

Oranj^es, Califvunia .\avel 100 0 

Oranges, Cuilifornia, V’aleiu*ia 100 0 

Total . . 500 0 

Averap.* (per rent ot' 19 IS) 100 0 


* Sfe note l<j 17. 'J ■ >iu « inui^ ( ;o}- 

Based oa dalu >ii Tahlo 17. J 


1040 

1050 

10.71 1 

10.72 ! 

] 0.73 

121 2 

u;i 2 

130 9 1 

121 5 1 

133 3 

115 1 

112 0 

100 2 , 

1 If) 0 i 

111 () 

128 1 

1 U) 0 

!;.;() 5 , 

111 7 ; 

1C7 0 

128 3 

lOJ 4 j 

nisi 

FUi i\ ! 

103 3 

122 0 

1 18 5 

127 ! 

12',) 2 j 

133 0 

01.5 0 

9 i() 9 ' 

-I i 

Gl Po j 

1K)0‘ 7 

123 2 

12S 1 ' 

12! ') ' 

122 8 ! 

121 0 


I \BIT: 17.7 

Construction of Index \n tubers of Citrus Fruit Prices,, 1918 19,53* ^ by Cse of 
Arith nietic Means of Price lielatires 11 eiahteit by Ifase-^ ettr {1918) f'nlttes 

la '‘f d"l! us ' 


Frujt 

Grapefruit 

Lcruonhi 

Omniioii, J'lurnl.i 
OraiiKes, Galjf'i; Mia, navel 
Orange.i. CalifornKi, Valnuja 
Total . 

Index niimher (per cent of 
191S} 


19 IS 
ahi( 


Price rtdalive (d -fxa ificfl \'c'ii multiplied liv I94S 
value 


1 9 IS ■ PU'j ■ lur.u 
198.990 198,999 I 91 . 9S7, J Tu , 5 1 


1 9r>;j 


19.‘)1 ‘ I9r/.i 

1 PJ.liLM lM2~:n4rri5,ir>4 
S7,77.i, S7 77;n91.9L»7i 99,09r,; 9.7 . K is' i 99 , 9:i9j 97, 9.), 7 
199. M n99, 141 791 1*91 . 91,7|l>,79 , SSd JJ J , 444|254 , 70.7 

97. .721, 97,721 12.7.J2.'i: 9s , SS9, 1 09 , O.d J l.'Ci . 1*1 SI 100 . 742 
119 3 iS 119'. 3:js 14', 979| 1 .37 , S91 ; 14 S , ( >9S 1,70 , 309| 1,75 , 428 

,t»09 . 979 7.79 , S J 7 SO.l . 33.Sj7r>5 , OS l! 739 . 224 

'til 


100 0 124 1 , 1.31 8 I 123 8 ! 121 2 


753,994 
12 a 7 


* Sfe to T.'iMo 17.2 (uop 

Biiaed uu p'^j^e rdaiivt'S m d'lil !<* 17 9 and I'HS val.ic data in I'alilr 17 4 . 


relatives for th(* five citrus fruits from 1918 througli 1053, lOachof the.se 
senes of relatives was cornputeti in the; same manner a.s were therclaliv^es 
for Florida oranges in Table 17.1, wliieli are hero repeated in the third 
row of figures in 'i’ahlc 17.(5, 

2. Average the price, relatives for each year separately^ thus obtaining a 
scries of index numlx'rs. In the lower [lart of Table 17.(5 a simple arith- 
metic mean of the relative's has been u.sed. 1'he shortcoming of this 
method is that each relative (irrespective of the importance of the com- 
modity which it represents) influence.s the index number for a given 
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year according to its percentage of increase or decrease over the base 
period. Chart 17.3 shows the index and the five series of price relatives. 
From this chart it may b(i seen that in 1950 two relatives increased^ while 
iliree declined, but the index rose because the two relatives whif.'h 
increased more than offset the three which declined. The two relatives 
which increased might have represented minor components of the index; 
the result would have been the same. It may be worth while to point 
out that the simple arithmetic mean of price relatives is equivalent to a 
weighted aggregative index, where the weights are the amount of each 


PFR CENT 



Chart 17. a. Simple Arilhmc'tir Avera^re Index IViiniber of Citrus Fruit 
Prices ami Frire li\ t‘s of ICach of I he Fivi' 1 rui 1918“19o3. IQ-tll == 100. 

Data from Table 17.0, 


commodit}^ purcljasablc by $1.00 (or any sj)ecilii*<i amount) in the base 
year. This is the same as weighting by the reciprocals of base-year 
prices. 

It is, of course, pos.sible to use averages other than the arithmetic 
mean, for example, the geometric mean, the median, or the harmonic 
mean, and some attention will be given to this topic later. More impor- 
tant, how’ever, is the a])pli(*ation < ’ wTiglits to the relatives. These 
w’eights should be value wa^ights, in contrast to the quantify weights used 
with the aggregative method. Tlic reason for this will be apparent 
shortly. Table 17.7 shows the computation of an index of citrus fruit 
prices with the relatives of Table 17. b w'cighted by the value of each fruit 
in the base year, 10 18. As is apparent from the table, the procedure con- 
sists of: (1) multiplying the relatives by their w'eights, (2) summing these 
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products year by year, and (3) dividing these totals for each year by the 
sum of the weights. Except for differences due to rounding, the results 
are the same as those obtained for the aggregative index with base-year- 
quantity w^eights (Table 17.3). That this should be so can be demon- 
strated simply. Let us first take a single commodity, Florida oranges, 
and show that (A) the base-year (1948) value weight applied to the given- 
year (1953) relative pro<.lucea the same result as (B) the base-year (1948) 
quantity times the given-year (1953) price. That is: 

(A) , . The price relative for 1053 is 8L36 -r S3. 41 

== or 127.80 per cent; 

the base-year value times the 1953 price 

relative is .. . 8199,144,000 X 1.2780 - 8254,620.000. 

(B) .The base-year (piantity times the given- 

year prire is 58,400,000 X S4.36 825b»>2L000. 

(Table 17.7 shows 8254,705,000 for Florida oranges for 1953 because the 
1953 relati\'e was taken as 127.9.) 

This relationship is true, not only for each indi\4(lual commodity, but 
for groups of commodities'^ as well. In .syinboN; 



^Po(lo 


^ More gcfifrally,' the fr»]lo\ving rclut iora^liips iri;iy h« wvh rfgnni to pnr(‘ 

index iiiirnhory: 

(1) An arithmetic Jivoragc of rchdivfs woiKhrcd hy t>Msc-\»*ar valuc.s tlui 

oqtiiv'alcnt of an aggregatixe indfx wcj^titfd with hasc -Ncn)- (juiintun's. 

(2) Siniilariy, an arithmetic average of relatives weighted by the )>rodiict of base- 

year prices and given-y<*ar quantilK s h'j tlie (•(piivalenl of an ai'grt g.itive index 

weighted with giv'cn-yeiir qu.antities. 

C,\} A harmonic average of relatives ^\elg^lt♦‘d hy given-\ear value''- is the 

equivalent of an aggregative index weight^‘d with given-year qmintit.ie.s. Thus, 



(4) Similarly, it may be shown that a harmonic average of relatives weighted by the 
product of h.a.se-year quantities and given -year priees is the equivalent of an 

aggregative index weighted with base year qu.an titles. 

Thc.se generalisations may bo stated in the form of guides to tin- construeiion of 
index numViers, when the index numbers are to be construete<l from relatives: 

(a) If it is de.sired to iisc^ the aritlirnctic average of relatives, the value weights shouki 
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Evidently the method of weighted average of relatives with base-year- 
value weights is usually a roundabout method of doing what may more 
easily be aeeomplished by direct means using aggregates with base-yoar- 
quantity weights. Furthermore, the meaning of an aggregative index 
se(uns chiarer to most persons than does an average of relatives. Why, 
then, should not the aggregative method always be used? One reason is 
that the price relatives themselves are occasionally worth studying, not 
only be(‘ause an individual series may hold special significance for the 
reader, but be<'ausc a study of groups of n‘.latives may assist in selecting 
a sample or determining what group indexes to make. In connection 
with frecpicncy distributions, it was ()])serve(l that an average never gives 
a complete piidure of any sitiiation. Other measures may he worth 
making. Another reason is that the series to be « nnbined can sometimes 
be obtained only in the form of relatives, or, they may have meaning only 
as r(4atives because, as in the case of quantity indexes, a scries may con- 
sist ol sevetai .^ubsenaes expressed in different physical units, d'he use of 
relatives is more common in the construction of (juantity indexes (to be 
discussed later) than in the makii\g of pric'O indexes, since the components 
of (luantity indexes arc themselves often indexes or relatives. 

(>>niniodit} weights versus group w'eights* The same practical 
advice may ])0 oHVn'ed (aaiccrning value weights that was given concerning 
cjuantity weights only approximate accuracy is necessary. Neverthe- 
less. the following consideration becomes important when only a limited 
nuinlxu’ of comm(j(lities is chosen: Should the value weight selected for 
any given commodity b(' tli(i value of that cominot^rif entering the market, 
or should it refer (o (he whole aroup of commodities v.hirh the commodity 
represents? I'he answer to this (piestion is that, unless it is practicable 
to iiKTcase the number of items in some groups (..nd perhaps deerease the 
number in oth(*rs) sufficiently to obtain proport ir)nate\alue representa- 
tion for the different groups, it is decidedly better to adjust the weiglils 
of the different items so as to obtain such group reprc‘sentation. Alost 
satisfactory r(‘sults will be obtained if we select cs large a number of 
commodities from each group as feasible, and at the same time give addi- 
tional weight to those elements that are undcr-represenied. 

Another method of accomplishing the same result is *0 sel('(‘t as many 
commoditi(‘s as convenient for eaci ’i;roup, to compiue separate group 

be the, products of the base prices and whatever quantities arc; desired. 

(b) If it js desired to use an averaRO of relative.s oniployinp; valr.c weights that are 
lh(' prcMluet of giv^en-vear prices and (piantities of some period, the harmonic average 
.should hv used, 

tinder no circumstances should the firithmetic average of relatives be used with 
values involving gw ri-year prices, since this gives extra weight to a commodity 
merely heeau.se it has gone up in price. Siieli a procedure results in an upwanl bias. 
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indexes, and then to combine the group indexes into a general index, using 
the appropriate weights. Since the group indexes are relatives, their 
comlhnation presents no new problem. It might further be noticed that 
weighting of commodities may in a sense be regarded as a substitute for 
selecting the rnimber of commodities from the different groups in propor- 
tion to the value of those groups. 

Types of averages. The geometric mean. Somotime.s it is argued 
that the geometric mean should be used for averaging price rtdatives. 

Let us consider a simple ease using only two commoflities and involving 
the measurement of price level between two countries. L'sing C^urntry 
A as the base, we get the following results, showing that, according to 
the arithmetic mean, the price level in Country B is 25 per cent higher 
than in Country 4. 
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From calculation', thf‘ arithmetic mean iiidicntes tint tiic price 

le\ci in ( ’..aiiitry .4 js 25 jau /-ent liigher than in ( ’ounti y B. 

'iljc le^uits of the comjjiitations in the two tabus ap’pear to be, incon- 
Ilovever, they are imauisistent, not la'cause of a >]ir»itcmmug 
of t[if‘ ai itimu'tu' nuam, })i]t because of hidden weights which au* not Ihe 
same m the two situations. When ('(iuat/y A was the, ba.se, it was 
assumed tint th(* amoiiiil.-, of w'heat and ((»tton purchased in C’ountry A 
vould be, the rmmtau of units of wlieat fl^ busli(4s) and the numl^ei of 
unit^ of cotton fsS^ p(/unds) purcha.‘'(<l by Sl.(X) (or other specified 
amount of moiew). and that iht mt/ie irnghlH irould hoi fi for Country B, 
That is, foi ("ountiy A : 

bu-h(ls of wheat $0.H0 ~ $1.00; relative - 100; 

Sg pomade of cotton (<r, .12 1.00; relative -- 100; 
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and for (\}iintry B: 

U bushels oJ wheats «1.60 - $2.01); relative - 200; 

8-i pounds of rotfoii (d; .00 ^ .50; relative = 50, 

On this oa.^is, the pi ice level in Country B is 25 per cent higher than in 
Country A . 

Wluai ("oiintry B was the base, it Avas as.^uined that tlie amounts of 
wheat and rotton jntniha.sed in Count»^y B would be the‘ number of 

units f)f wlieat (J Ini-'hels) and the nuniber of units of eotton (lOl 

I)oun(is) pur(‘has(Ml Ijv I?hU0 (or otlior specified amount of money), 
and ikdt the same weights would hold for Country A, 

This gives, for Conntiy B: 

I bushels of wlieat (a, SI. GO -- SI. 00; relative -*-= 100; 

IG^ pounds of cotton (uj .OG - $1.0t): lative -- iOO; 

and for C<mnl!^> A : 

f bushels of wheat (y- SO SO --- $0.5tb lelative ~ 50; 

iG| pounds of '‘otton .12 --- 2J‘0; rc-iative ™ 200, 

I’se of this ser of indj'“it(s ttnl tlu' pri^e level m Country .1 is 

25 I'a.’i cent higliei Cian ri (’oiuiiiv B. 

Now, tlie gcf itueTi i<' iiu a u s<»!net ii-'es advocaited l^vause it gna's eon- 
sisti'iit 1‘csulls In ^]tu:^tion^- such as tisose .‘•hown in (he tno tablirs above. 

Tlie ’.'{‘suits are . (Ui^'i^teut b('**aUM*, with either taaiiitiy as the ba^'C, the 
index numbte- ior tlu* otlaa (ounxiy is 100, as may be* seen in tin: tables. 

Jiut the geoiiKdiie mean >]e]»ls consistent le.snlts cmly bceumse r‘f the 
assuTuption inlic cnt in it. Tins is tJial tlie value of the two {‘(un- 
inotlities })urehasMi he in the .‘^aine ’''>tio in lie- v/.o countiies. This 
m(‘ans that mou' w in at would iu‘ bouglit ni ( Yniaty 1 tlian in Country 
B, and that mr > e-.ttmi wapjid be bought in C^iuntry ii than in A. 

In the fotegomg pauagraphs, no weights had been specified for tfie 
index iium’hcis win* }i were made. \W have aheadv .seen that lelaCves 
should be weighicd by propeily selected values, ami for tln‘ illustiations 
just given those weights sliouid be determined upmi the basis of the 
actual value of t)i(‘ commodituvs ^i»ld in tlie two count nes. 

Anotlier aigument for tiu^ geometiie mean is l>ase»I upmi tlu' asseition 
that freipioney disti )t)ut’')ns of jirice relalixes tend to form a normal 
distribution wliei. plotted on papeu- having a log;ui(hnuc .Y scale. Sueh 
a freipiency distiibution, but not of ]>rice lelaiives, is sliown in Cliart.s 
25.18 and 28. bi. Tiio leasomug runs as follows: the doul.ling of a [>iice 
re[)resents as impuitant a diverge; » (and is as likel> (o occur) as a 
decline to one-half of its former level: it is as likely to ineiease to | of 
the base peiiud as t(i fall to 5 of the liaso jieriod; it is as likelv to rise to 
infinity as it is to fall to zero. The resulting frctpicney distribution 
therefore tends to be nojinal geometrically, and tiie geometiie mean, 
which coinciiles with the mode of sueh a distribution, is the appro- 
priate average. This argument is logical but is based uj)on promises 
that are not fui / established. We arc nel sure that a price is as likely 
to double aa to drop oiie~half, or as likely to increase 60 per cent as to 
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drop one-third; and, unless balancing: of this sort takes jdace, we do not 
have an appropriate basis for usin#; the geometric mean. 

It should not be thought tluit the geometric mean must never be used; 
it merely is to be doubted that it has any inherent general supiuiority 
over the arithmetic mean. It is the belief of the authors that th(‘ aver- 
age to Use i.s detei mined in large part l)y the use for which the index 
numlwrs aio intended. If, as is very often the case, we wish to cuiupaK; 
the amount of money reijuired at two different times or in two different 
places to purchase tlie same commodities (or [)erhaps the same amount 
of satisfaction by like individuals, with tastes and enviionmcnt field con- 
stant), the weighted arithmetic mean should be used 'Diis is l>e(\‘ii(c>e, 
as lias been shown, such an inde.x number may rffso be legaidcd as a 
wciglited aggregative index numbei. On the other hand, if tlu' piimary 
object is tlie study of pi ice relatives, including tfieir average' bf'liavior, 
the goornetiic moan may be useful- 

The viodCf the median, and the harmotu'c mean. I -S(‘ of the mode is 
virtually never advocated, the piimaiv reason being that oidinaiily iio 
clearly definefl mode would bo pnsent in a group of pi ice iel:itivi*>. The 
median is siddom used, but it might he approjn late if doubt e\ists 
concerning the accuracy or rejiicscnt.ative chaiactei of sorin' of ilie rlata. 

Of course, the piesonce of such a doubt may actually nu'an Ifiat tia* basic 
data W'erc not profierly gatheic<l. TsfMif the harmonic im'an has bi'i'ii 
suggested by Forger (see footnote 2 in C’liapter \S) if it i< desiir 'l lo use 
the rei'ijiiocal of a pi ice index as an index of the purclia^ing power ot 
money. 

(Comparison of the four of price indexes. H('f()n‘ beginning 

tbe eonsiderutioii of quantity indexes, jt may Ix^ \v(*l! to pause a nioinent, 


PER CENT 



Chari 17.4. Index NumherH of Cilrii.s Friiil l*riee», as Ohtuiiieil by Dilfercnt 
Methodn, I9lft 195:5. Data from tables 17.2, 17.:?, 17.(>, and 17.7. 
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and to compare the results of the four types of price indexes which have 
been discussed, (^hart 17.4 shows tliese four indexes, but it has three 
(‘urves ratluT than four, because two of the indexes coincide. As we 
already know, th(^ two that are alike are the agp^regatiVe with base-year- 
(juantity \v(*ijj;his and tlie arirhmetic average of relatives weighted by 
base-year values. Xobi tlui general agixrment of all tliree curves, 
although there are sonu> important differences in magnitude (for exam- 
ple, in 1950) and in din'ction (for (‘xarnple, in 1952). The simple aggre- 
gative and the simple* aritlimeti^* average* oi relatives, l)oth of which 
have logie*al slK)rtee)mjng.s, l>olli failed to go liigli (‘in)uali in four years 
and in I wo iiistama's rnovesl in the* wrong dir(*etion 


Ql AM U V IM)i:X NUMHEKS 

Aggregative l>pe. An aggiegative inelew numi)er of epianfity 
rphysieail volurjie) is the* eour{te*rpart of the eorn*sponeling prie*e index, 
I'hus, the*. » tHistruction eif a simple aggr(‘gative (juantily index would 
involve* the* formula 



and 'Table* 17 S slniws the computation of such a ([uantity index for (‘itrus 
fruits. ( Irdinarily, an iinl(‘x (‘omputed in this way is obviously illogieah 
.since* it involve's adding (iuantiti(*s ex}>rosseel in dilTen*nt units, such as 
tons, thousands of board feet, kilejwatl hours, and so on. For the citrus 
fruit, it would have be'(*n possible to express all prexluetion in terms of 
pounds, but eve*n Ibis would not yielel a satisfactory index, since the r(*la- 
tuu* iin])ortanee’ c)f t*aeli fruit ni tlie eeunoiny w'()nKI have bee*n ignoreel 
Using i)ase-year prie(*s as weights, the foriuuhi l)(*come*s 


Q -- 




const nietion of this waiglite'd aggregative* (inantity index, wnth 
19 IS — lot), is show’u in 'Table 17.9. 

Just as the* aggr(*gativc ind(*x number of prie-e measure's the changing 
value of a fixe*] aggregate of go()eJ.s at varvung prie-es, so the aggregative 
index number of ])liy.sie’al volume m(*asures the changing value* ejf a 
varying aggr(*gate of goexls at fixed prices. The price index answers 
the fpiestion: If wm buv tlie same assortment of goods each year, but 
at dijjvreni prices, how' much wall we spend each year? The physical 
volume index answers the (luesliem; If \\q buy varying quantities of 
specified geiods each year, but at the same prie'e, how much will wa* spend 
each year? While in the former case the dilTerence in amount spent was 
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due to price change, in the latter case the difference must, of course, be 
attributed to changes in quantities bought and sold, since prices were 
held constant. Thus an index, computed by use of the formula last 
given, tells us the comparative quantities (produced, sold, consumed, and 
so forth) for each of the periods covered. 


TABLK 17.8 


Construction of Siniftlc Aifurcfiadv'e Intlox iWituhcrs of Citrus Fruit Fro- 

fiuction, lots I9r,3* 


(Quantities in thousA/uN ot I oxfo ) 


Knilt 

(trapnfniit 

Lomons . . . , 

Oraug^a. Florida . . , , 

Orarigea, California, Navel . . 
Oranges, California, Valenica 
Aggregate 

Iiulex number (per rent of 19 IS) 


_1948_l 1919 1 1950 i 195! 1 19512 | l‘K58 
,7)00' 90,200; 27''200l 99 , 200; 91) , 009: 7l2 . 500 
12,870; lO.Olo! 11 900; 15,1501 i‘2,S00; 11,900 
58,400^ 58,30o'i 5S.500 07, 300, 78,000' 72,200 
18,000; ll,910j 15 . (mO; 14. 010! I2,r>0{)j lO.GOO 
^,030; 25,l00i 20,230| 30,000| 25,8101 28,700 

1 oO .1 (K)’ 1 s.") , .'■. 20 ' 1 , i' 2 {H ,■>(), 1 (Ur 1 ti.') , 81 0 ! I (1 i 7 ‘.).M 0 
100 0 ! 90 :i i 90 0 ' lO'i 0 I no 5 i 107 9 


• noro to Tftblo )7/J I'cncru niriK iTop ^ oarj«. 
Diittt from aoiirct**! given lu>Iot\ Tutlo 17 2 


TMIf.K 17.9 


Cunstriictiou of .i^firef^dtirv Index iVutuhers of Citrus Frtiit Froiiuetion, 
/9/tV-- /95,7, U i'i^h tvfl by Prices in 


allies III t(U'U>aiHls of dollara ' 


Fruit 


Grapefruit 

Leinon.*j 

Orangey, Florirbi 
Orange.s, Cri!if<irni:i, Navel 
Oranges, Califnniia, \ alrrina 
Aggregutt* value 
ItkIox nu/nlji’i (.jut cent t»f 
1018 ).. 


\ 1948 ! Value of urnount {>r«'(iiiccd in .'ipecified ytv'ir at 

i ]>riro ' _ 1948 pni e 

Iperboxl l’04K~j' 1910 j lOoO I i95L> ( 1955 

! $,i 50 i 1 OS , 0o7>, 99 , 0‘;o' 79 , 800, 109 , 500 IIS. ^'(>( • 107, 250 

i 0 S2 ! 87,773' 0S.208. 77,475' 91,729; S7.290i 81,158 

I 3 1 1 ; 1 99 , 1 44‘ 1 9S , S03; 1 99 , 4S5!229 . 493'2uS , 020^240 , 202 
! 5 10 ' 97,521 01, 4,^,0! SO.Ooli 75,,{SS' 0:,,010' .8,5,811 

i 4 32 . no, .338 108, 4321113, 31 1 132 192 111.499:123,984 

' OOO , 079 ,530 , 0 111,5.50 , 78.5 O'ly. , 3‘i2 0>,50 , 037 ,044 , 405 

' . 1 ‘ I > I » 

1 I too 0 . SS 0 ' 90 3 , 104 ; 100 105 7 


* See Rote to TaOO 17 2 voTiceiiiing crop years 

Basel on r^ufint.t', data of I’liOli 17,8 are! 1948 price ila^a m Talile 17,2. 


Various mothofis of woighting arf3 availahlo for tho ronstniotion of 
quantity index numlxTS, and in general tho sfunc considorations apply 
that worn disou.ssed in connootion with prion index numbers. In obtain- 
ing prif'e wiMghts whioh are averages of two or more; ytiars, the average 
prices should he weighted-average prices, obtained liy dividing the total 
value sold in these years by the total number of units in those same years. 
Thus, if average quantities of base and given years are used, wc have the 
rather formidable-looking formula 
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Q _ V go -h g n / 

S<,. t-M-j 

\ qo {- Qn / 

Likewise, if the eornrnon-fac'.tor metho^i is used, tiic pricij weight should 
be derived iron) the largest vjiluo that is ecuDmon fo all the years in 
question. 


TAIU.K n.io 


Cousitritc tion of lnfl<*x Nttnibvr^ of Citrus f ruit J9 iS-l953^. 

by IJsv of Sim pb* ,irit it rnrl it' t>f Quonlity lit'lativvs 


Io-(\0 

; lots : 

1949 

1950 

1951 

io:>2 

lO.'iH 

r»fjij,cfniit 

j KH) 0 , 

91 5 

73 3 

100 


1U9.1 

98 5 

l/cmona 

1 10(1 0 1 

77 s 

S3 3 

101 

5 

99 5 

02.5 

Oranges, Florida 

j )()0 0 i 

99 8 

100 2 

11.5 

2 

134 0 

123 6 

Omniroa, Ciilifoniia. Xavcl 

: too 0 1 

0,3 0 1 

82 7 : 

77 

■'5 

00 7 

88 0 

Or.u^yv., ‘ ddor Valciiria 

i !(|(> 0 ! 

93 2 i 

97 4 

113 

0 

95 S 

1 100 6 

'rotril 

j .'iOO 0 ; 

■425 3 ; 

4ti' 9 ' 

’511 

'2’| 

50:/ 7 

'5(io'2“ 

Avcrr.',»;o (jM>r'9'nt of lots') . 

1 iOG U . 

•So i ! 

i 88 4 

; W2 

2 I 

; 101 1 

! 101.8 


* lu I’ 2 I'ori' crn.rij,’ ii 

Ua-ir.i un <iRt i Tai-Si P 


TAIU.K IV. n 


Cofistro<'tit)tt of fmicx :\tirn}K>rs of Citrus f ruit l*rofluettt}H^ J9 iS -1933* ^ 
by iKsp of Jrit hnivdr Mciins of (hionfity ficlativvs U’viiJiiiU <l by finst- 

Yvar i (9 iff ) Values 




Ti lilt 


CJiupt'Ci \ut 
Lcni.ons 

auKt'M, l'l‘'>riiia 
OniriKo.*^, Califorum, Navel 
OrariK'^'a, tom, i‘i 

Total . 

Index nuniher (oei (vnt of 
ia4SK 


I rt'laOvo of M/'jneti 

: rj IS value 

, I'J'l't ; IIM'I , i')aO ; UjOj 


>e.ir TTiulOphoil by 


']U8,Moa,10^,'R)U J I 
87,77.4- S7 77.i', uS L’Sy. 


]ij‘) .0 
:7..o04! a 1.7 


197.2 i 19.5.4 
5.4,1 IS, "siu!l 07 , 20 b 
J.4' s7.;i44- SI, 190 


T99 144,199, 144;19S 74t'. 1 99 , 4 12,229 , 4 M,2<JS ()4S;2ir),142 
I 97,7)2 1 97 521' 01 4 to! SO. 0.52' 7.5,4Sfi 07.019! Sr..S21 
ji io,;i4s'rn’, :i4s!HJS, 42r'n.4 ,3i4,i.>2. ir,o 1 1 1 , 17 . 2 ; 124 .oi_o 
iu9, 079 0,U^^5n 7,50,84.5 o:is",2:)0|r., 50, 004,044, 435 

; ‘ ! I i 

! ' 100 0 : S.S 0 : 90 :> ! 104 7 ; 100 7 I 105 7 


* See iiofo to Table 17.2 loMrcrnnifc; rro,. \ vurw. 

Based on (luunTity ioIrMvch in Tablv 17 19 and 1948 ab.”' baa .ri Pabli' 17 9. 


Averages of reJalives, Tliii method of eoibstrucliiig (plan tit v index 
numbers is strictly analogous to the method applied to the measuring of 
price changes. The procedure is illustrated by Tables 17.10 and 17. M. 
As wa.s tound to be true with price index numborKS. the use of basc-year- 
valuc weights produces the same result as the aggregative method employ- 
ing base-year-quantity weights. 
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Because of ease of computation and simplicity of meaning, the aggroga- 
tive method is to he preferred to the avcrago-of-reUitivea nietliod wlieri- 
ever it is appli(;ahle. As noted before, there are (-ireunKsI anc.es when the 
aggregative method cannot be used. Nut previously mentioned is the 
situation that obtains when the relatives which are to he averaged are 
percentages, not of a fixed base hut of a changing normal. Here, of 
course, the average-of-relatives method is necessary. Jn othcn* Avords, the 
aggregative method cannot ho used if an index of hnsiiusss cycles is to be 
constructed, since ihe data to he averaged are percentages of trend atid 
seasonal. 

Vs\ially the weights selected for an average of (puuitity relatives are in 
proportion to the values in exchange of the difTcnent series. ()c<‘asionally, 
some consideration is given also to the relative amijlilude of the ditbu'ent 
series, if they are cyclical relatives. If an index is constructed, not for the 
purpose of mmsi/rifuj changes hut for the purpose of /e/Y6V7.s'////r/ changers, 
the basis of sf*lectiiig will he, not the e(‘onomii^ impfntarjce of th(‘ (lil'fercMJt 
series represf»nted, hut tlunr im})ortan(‘o fur p\irpt)^(*s of rorc<‘a.st ing. 

Chapter 18 will describe mcthod> of constiMictiiig a numhej- of iinix)! - 
tant indexes and will dis<*uss c(^rtnin j)oiuts ot techuiquf' a-nd theory in)t 
covered in this chapter. 



Sytnbals Used tii flhapier 18 


p: price of a comrrHxJit y. 

P: price itule.x number. 
q: (piantiiy of a comiiuxlity. 

Q : f[uantity iiulex number. 

7i: a .subscript iiulieatin^ a i)eri‘Ml or the eurrc'nt period. 

o: a siit^script indicatirur; the base period. 

2: upper-(*ase f'lreek siji;ma, moaning “take the sum of.’^ 
v: units of pureha.sing power per dolbir. 

Numerical .subscripts to p imd q, when ritten 58 or 47-19, for example, 
inilic.. b. tliat the piic(‘ or (|uantity r(‘f(‘rrc*d to is fc)r the year sptH-ified 
or is the average (or total} for the yauiis separated by the hyphen. 
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Index Number Theory and Practice 


The object of this (’li.'ipter is twofold Firsf,, the theory of index num- 
bers and eorbiin refinements te(‘}ini{(ue will be further di,scu''.-:.ed. See- 
ond, a doseri]>liop. of a number of indexes will be en. d-he indexes \vere 
selected partly on aeeount of thei? wide uMUuhies.^, juul partly oi^ 
of the interesting teehnifjue which thfp/ empi‘iy. In (rcneral it wi!! be 
found that in ac'lual practice the proce(hn’(’S .citiite'-j 1 ;7 will 
not be ft)lio\\(^d exactly, but tliat in e:ich •'-Hnstan.ccs 

which justify .'ipocjai mndiu •aih>in’' of rnctla'd, 

IMIKX :M MUl Ji rONCKP l S 

Matlicnialieal On*' «>f thoiiafil on iedrx numlHU'S 

])elieves that tlnMX' may b-e Mieh a tliunjc in** a pc: feci, indev nunib(U’ formula, 
and that such a formula ran be u-togni/cd bv it- abiHiA to nu'et cfTlMain 
mathemaFical ' It'st," of (aacei-teney. Vv'ln'ther or noi tle.).S(‘ t(‘sLs are 
logic’ully valid is an open r|Uc-.|ion \'ol otily cju: anmdex b(' considfued 
‘‘ideal’’ if it. Jiu'i'ts fho'^c tests, ac<*f)rdirHr tc this f]s'“r^a buf oMier indexes 
that do not me<>t [Ikmu ran t/c gradtwi :i{(‘ordu!ij ?o l»u\v <0)se]}' they 
approxirruite tliern in aclual praT't,(^‘(^ 

The te^’fs are d(‘ii\'erj by t)i(‘ it>gic of analogy. .Xnytlnng that is true of 
an indiviiluai eomniudtly should also la' ti-ia^ t)( a giou}’' o: ronuic'ditit^s 
considered as a whole. Jf a ))o\ of (T.nnges was wf)rtli IJo per vcnt as 
mmdi ill lOo.d as it w-a.-^ in Ihfx, tJien the 1‘i'bS price wa.s 80 pet' ermt (tf tl\e 
i()o3 price, iieasoning by {uialogv. if tin indt'\ iiundim’ for 1058 was rj5 
with respect to a 10t8 btisia then (lie index numlwT for !0IS should be 
80 w'ith respect to a 195:i base. In otlun* words, an index number should 
work backward tis well as forward. 

Again, sinipose that a commodity increases from 10 cents to GO cents 
and thtit the sales mcieasc from 2 units fo 4 units. Tlu' [>rice is 150 per 
cent of the base veat, the {pnuitity sales aie 200 per cent, while the value 

The first (Inift of rl.is c.jiaptcr w.as pr ‘pared l>y Dr. James O Paris. 

^26 
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is 1.50 X 2.00 — 3.00 times the base 3 ''ear, or 300 per cent of the base year. 

This is vc'Hlied by not lhal ^ ^ ™ 3. Once more reasoning from 

0,10 X 2 


analogy, it may be argucO that a pric^o irulex times a (pianiity index com' 
pnb‘(l fioin the sainc' daia should erpial the relative value of the transac- 
tio!)s in the giv(‘.n year with r(\sp(Tt to the laiso 3 ^ear. In other words, if 


Vr. qn P.qn 

Px> ({o j*oqo 


(Ik'Ii it h1)(hi!( 1 iic tnio *lci,t 


rxQ 


V 


^p/fo 


As iiidiffiled jti the prerrding f);iragr;ipli, thei • are two tesp: wdiich are 
considered e.M)enally important i)y the ‘Tnathemat,ie:d test” school, 
Tfie^’ ' '! f’;ill»‘d’ 1 ; the tinu tcvcrsji T(‘st: (2) the/ar/er reversal lest. 

Tli'^ lime re\ersal nnpy he state<i m‘''re precisely avS follows; If the 
lime s\ihvseri}ds of a jjriee (c’r (pnintitv; ir.dex mimitor formula he inlcr- 
changiMl ih(' resulting T>rie(' (or (piaiitity i forniiihi .should i)C the reciprocal 
ot the origi’uvi forum In. If w(‘. take the formiila 


and interchange (he (inn* suhscriiots. the rc.sulting formula la 


Rut 


Pn^n 

"-'p-Z/o ^Pv(lr. 

X 'T — 

3 /PCo ^PuQn 


hence the tost is not met. On (he otlier hatul, the tormula 


becomes 





I . 

i 


X 

^ypnipx 




the product of the two expressions is unity, and Irving Fisher’s ‘'ideal” 
index meets the time reversal t(^st. 

The factor reversal test may be stated in tliis way: If the p and q factors 
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in a price (or quantity) index formula be interchanged, so that a quantity 
(or price) index formula is obtained, the product of the two indexes should 
give the true value ratio 

Again taking the formula 

‘K - » 

^PcQo 

\vc transform it into 

^<JoPo 

This is a quantity index, hut siiice 


the test is not met. 


x' V V 

A r 

^ Po^o *^qopo ^ P<>^} - 

However, we find that 

i -v* 


transforms into 

/27,p„ z:^7p,. 

The product of thocse two ’‘id(;ar’ indexes is 


^Pr/Jn 

^Pof/o 

and the test is met. 

Fisher’s *Mdear’ index number is .so called hemin.'^c i< is one of an 
extremely limited number of ind('xc.s that meet both of these te^sts. 

Relationship of formula to use. The concept of an “ideal” ind«\\ 
is attacked by index number students belonging to a difff'rent school of 
thought on the ground that the analyst cannot say exactly what tht' 
“ideal” index measures; he can only assert vaguely that it measures a 
change in the price level, or use some similar expression, '^fo Willford I. 
King,^ the logical procedure is to ask a«sp(M;ific (puxstion, and ,then to 
devise a formula which will answer that specific question. For instance, 


* See Willford I. King, Index Numbers Elueidnled. Longmans, (Jrcen and (Jompariy, 
New York. 1930, especially Chapter III. The reader may also wi.sh to refer to li. D. 
Mudgott, Index Numbers, John Wiley & irons, Inc., New York. 1051, Chapter 4. 
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Sp q 

the formula ° applied to retail prices compares the cost in the present 

year with the cost in the base year of supporting the physical scale of 
living which o})tainccl in the base year. While this is a specific question, 
it may not be the most useful (iU(*stion to ask. Just what is an appro- 
priate question to ask is an important problem facing the person conduct- 
ing the investigation. In Chapter 17 Keynes was interpreted as believing 
it appropriate' that, [for measiuing changes in the value of money^one 
should first simJ; an iridex number that would measure the changing cost 
of aggregates of goods yielding the same utility to similar groups of per- 


sons at two poj'iod.s. Now the formula 




assumes that, if their tastes 


do not change, people will continue to buy tlui same amounts of goods no 
matter how great the price rise or fall, wliile actually there is a shift from 
tliosp i^iuns whicli are becoming more ('xpoiisive to those whiidi are 
becoming clieaper.' ddus formula, tlien, would have an upward ‘^bias/' 
since the cost of ol’taining tlu' same quantity of goods would be higher 
than the cost of obtaining the, .same quantity of utility. The formula 

V 

” ’S on th(* other hand, compares the cost of supporting one^s present 

pliysicai siaile of living with its cost in the base year. This formula, from 
the same point of view, has a downward “bias,” since no .sensilile person 
would have liouglit the same gooiK' in the base year as he does now (even 
granting tlie same tastes and environment), because the relative prices of 
goods woiikl hava^ been diffi'rent. 'fhe eost of obtainir.g the present 
year’s bill of goods in tlu' base year would Riivo been gnxater tiian (he cost 
of obtaining the curnuit year’s economii* satisfactions. 

Fisher’s “ideal” index formula is the geometric mean of two index num- 
bers biased (or ina])propriate) in opposite direetious: and many persons 
, hold that the average of two wrong answers does noi m-cossarily give one 
right answer, even thougli the two errors are in opposite directions and 
even t hou gh the formula is internally coiKsistent. On thi* other hand, it 
Is^Moubtful that Keynes’ common-factor method will in actual practice 
answer Keynes’ (|ueslion any better than (if as wadi as) the “ideal” index 
number. Ohanges in relative prices with conseciuent idianges in relative 
(piantitie.s purchased may reduce die value of the common factor to a 
small proportion of the total goods bought. Nevertheless, it is still 
another attempt to arrive at a logical decision as to exactly what one is 
trying to measure. 

For purposes of measuring changes in the value of money (purchasing 
power of the dollar), it is customary to use the reciprocal of a price 
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index, Ferger, however, argues that this is illogical.- Just as a price 
index averages together price changes of specific commodities, so a pur- 
chasing power index should average tog(‘ther changes in the purchasing 
power of the dollar for specific commodities. If the price of corn is $.50 
per bushel, the purchasing power of the <lollar for corn is 2 Inishels. 
Designating units of purchasing power per dollar by the symbol Ferger 
suggests this purchasing power index number formula; 


Purchasing power ==• 



But since u — --- v/c may write 
P 


Purchasing p(jwcr 



This exprcssi(in is Un? r(’>'‘iprocal of I lie htirmoni«‘ mean of price relatives 
weighted by basc-ycnr values, since the latter is 



So Ferger’s formula is still in effect (though not in coiu‘ci)t; the reciprocal 
of a price index, though not th(; u.sual index based on the anthmelic mean. 
Presumably it would be possible to alter somewhat, the weighting system 
without doing violence to his c.oncej)t. 

If we accept the idea that the purpose of an index number determines 
its formula, we need not, nece.ssarily, abandoji the “ideal'' formula. It 
would be possible to maintain that, although th(‘ formula is not a perfect 
solution to every index number problem, nevertheless there are purposes 
for which it is especiall}" suited, as for iristance the analysis of value 
changes into constituent price changes and quantity changes. However, 
it seemingly would have to l)e abandoned as a theoretically sound index 
if we take the position that every index riuml)or must ans^ver a specific 
question couched in laymaiPs English. 

* See Wirth F. Ferger. “ i.)istinctive Concepts of Price and Purchasing Power Index 
Numbers,” Journal of the American Statistical Association^ Vol. XXXII. June 1936, 
pp. 258-272. 
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THE CHAIN INDEX 

In its simplest form, the chain index is one in which the figures for each 
year (or subperiod thereof) are first expressed as percentages of the pre- 
ceding year. These percentages arc then chained together by successive 
multiplication to form a cliain index. Table IS.l shows the compulation 
of a weighted aggregative i*hain index of citrus fruit prices. As noted 
abov(i the table, the prices are w^^iglited by production in the first year of 
each pair of years. These products are summed for each year and each 

TAIiLE ULl 

i^onstrurtinn of irci^hte<l i'hain hmicx of Citrus Frutf Prices^’* 
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sum is expressed as a pen*( ntago of the sum for the preceding year, as 
shown in tlie next -to tlie-Uisi ct^lunni of the tabic. The n suHs of the 
‘'chaining” prof’cdurr tirr shown in tlu* last rolurnn of lln* table. They 
are olitained a.s follows: (i) the 1 1) j>orcentage 1LM.2, is the 1949 chain 
index nun;ber; (2) since the lOhO percentage ligure is S.O per cent greater 
than 1949, the iOrU) chain ind(‘\ numla-r is 1.212 X 1.080 — 1.841, or 
134.1 per cent; (3) the I9r>i [XTceutage figure is 0.943 of the 19o() figure, 
sa the chain index number for J9ol i.s 1.341 X 0.913 1 205, or 120.5 

per cent; and so on for the otiuT years. 

LT he advantage,-? of a eliain index are: (1) eommoditie.<? may readil}’^ be 
dropped, if they are no longer relev.ant; (2) new commodities may ho 
introduced; and (3) weights may be ohanged. Thus, account may 
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readily be taken of basic changes in production, distribution, and con- 
sumption habits, of (juality changes, of any hiatus in some of the data, 
and of other similar changes that cannot readily be handled in a fixed- 
base index number. The principle of the (^haiu index is employed in 
several instances later in this chapter. 

The disadvantage of the chain index is that, whiles the porcenUige-of- 
previous-year figures give accurate (‘oinparisons of year-to-year (dianges, 
the long-range comparisons of the chained percentages arc? not strictly 
valid, llowev'cr, when the index-number user wishes to niak<^ year-to- 
year comparisons, as is so often done by the busino.ss man, the pen-en- 
tages of the preceding year jirovide a flexible and useful 

SIBSTITITING NEW COMMODITIES AND CHANGING 

WEIGHTS 


Sometimes it is necessary or desirable to drop a ('ommodity from an 
index, to add a new commodity, to substitute on(‘ (annmodity for another, 
or to change the weight of a commodity. Substituting one commodity 
for another will ordinarily involve also a change of wt^ighl. These 
adjustments involve an application of the chain index. As an illustration 
of substitution, we shall constru(‘t an index of the producers' price of 
grapefruit for the years 1918 (the base .year), 1*951, 1952, and 1958. 

A fairly satisfactory index of the producers' price of grapefruit (‘an la* 
made, through 1951, using Florida .s('(‘dless grapefruit, other Florida 
grapefruit, and Texas grapefruit However, in 1952 (that is, the 1951 
1952 season) because of a fre(?zc, the 'feexas grap»?fruit crop amounted to 
only about 200,000 boxes ami the price soared to S8.89 per l)ox. Again, 
in 1953 the Texas crop was only 400,000 boxes and the price was S2 31 per 
box. For the purposes of our illustration, we sliai! substitute Arizona 
grapefruit for dexas grapefruit in 1951. 

Table 18.2 shows the eomputatioii of a weighted aggregative index for 
1918 and 1951 using base-yc.^ar-qaantity weights, and it may bo seen 
that the 1951 index number is 225 31 for the ‘Wjld sericis'’ using Florida 
and Texas grapefruit. Tlu^ substitution of Arizona for Fexas grapefruit 
is made, in 1951, by multiplying tlie Arizona grapefruit price by the 
Texas weight, giving the produeA shown in the table; 15 776 million 
d( 3 llars. The total of th(^ products f(jr the 1951 ‘^new series is 53.250 
million dollars, and this total is .set eejuai to the already determined 1951 
index number, 225.31. The 1952 and 1953 products for Arizona grape- 
fruit arc determined as was the figure fcjr 1951, and sums of products are 
gotten for 1952 and 1953. The index numbers for 1952 and 1953 are 
then obtained by these relationships: 
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a: the value of Yr when A" — 0 in the equation Vr ~ a + bX. 

a': the value of A% when = 0 in the equation Xr a' + b'Y, 

at: number of observed freciuencies in the upper left cell of a 2 X 2 table. 

number of observed fre(|iioneies in the lower left cell of a 2 X 2 table. 

b: the slope of the estimating equation Yr - a + hX. 

/>': the slope of the estimating e([uation Xr = a' + f/Y. 
hx : number of observed frequeneies in tlie upper right eell of a 2 X 2 table. 

1)2'. number of observed frequencies in the lower right cell of a 2 X 2 table. 

C: coefficient of mean square contingeiicy. 

uevi ‘-lion of a ccll^ in terms of classes, from Xd. 

<1^: deviation of a cell, in terms of classesj from Y^. 

I): difference between the ranks of paired values. 

/: a fnv|uency; in grouped correlation, a frecjuemy in a (’ell. 

/v: a fre(iuency of the X series, in grouped correlation, a column fre- 
(|uency. 

fy: a fre(iuency of the 1" s(‘ries; in grouped correlation, a row frerpienoy, 

/(•: coeffi(*ient of alienation. 

Ic^. coefficiimt of non-deterrnination 

X: the number of items in a sample, fn two-variai)le correlation, N is 
the number of pairs of items. 

? : coefficient of correlation. 

eof'fijcient of determination, 
rrn.k. coidficient of rank correlation. 

.s-y: standard deviation of the A' .seri(\s. 

.Sr’, standard deviation of the V scries. 

.s‘v . Y ; standard error of e.stimate for the estimating equation Yr - a i- bX. 
2: upper- case (Ireek sigma, meatiing 'Make the sum of.” 
total variation of the Y values. 

variation of explained by use of the estimating equation = 
n + bX. 

variation of F unexplained by use of the estimating ecpuition Yc ~ 
a + bX. 
x: X - X. 

X: the A" series; also, an observed value in the X series. Thus, wt refer 
to correlating and F, but S.Y means 'bsum the values in the A'^ 
series.” 

4t9 
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X axis: the horizontal axis. 

Xc' a computed X value. 

X: the arithmetic mean of the .Y series. 

X^: ehi-stjuare. The symbol is a lower-case Greek chi. 
y: Y ~ Y. is the total variation in the Y series. 

2/c: Yc — Y. Xyl is the explained variation in the Y series, 
y,: Y — Yfi. is the unexplained variation in the Y scries. 

Y : the Y series, also an observed value in the 1^ series. Thus, we refer to 
correlating X and Y, but SY means ''sum the values in the Y series.'’ 
i'' axivs: the vertical axis. 

1^: a computed Y value. 

the arithmetic mean of tlie Y series. 

Fai the arithmetic mean of the Yc values; Yc = Y. 



CHAPTER 19 


Correlation I: Two- variable Linear 
Correlation 


()m‘ of the object of seicMCM* is io binato viiluos of ono factor 

by roiVreuco Io tlic \ aliK's of an assoeiaU^I f:i(‘lor ‘^Tho .scicritifi<‘- method 
. . - ’’rits ill die careriil and laboiioiis <‘lassi{icatioii of facts^ in the 

compariwoH of (heir relationship and .sctjrcMK'cs, and tinatly in the discov- 
ery by the aid of dl.sclprnu'd imaixination of a brief slatiTiient ov foniuila, 
which in a fc^^ words rernniH's a wnh' raii^c, of facts. Such a formula . . . 
is Uu’med a scientific law.’** When tin iidntionshi)) of a quantitative 
nature, tin* appropriate' slati.^lical tool foi discovering and measuring tlie 
j'clati(jnship and exjiri'^sing it in a brief fonnula is known as corrrlolion. 

A SfMlM.C EXPLANATION 

It may feurpr!s<' soon' nf us to know that tln're is a very close relation- 
shi]} iK’twefUi I enijM'iatui-e an<l tln‘ Ij'^qucncy wU.h whicli cricki'ts <'hirp. 
If, for instance, wo slionid count the ninnfjcr o) -chirps made by a cricket 
in If) s('c()inl,s and add it to !>7, w< could clo.scly approximate* t-lu' Ibahren- 
heit. iempeiatuK' at that tune. Or, if we should muiliply the degrees 
Falirenheit by lk7S and subliact 187 from the result, we could estimate 
the number of cliirps to lie t*\pe(*ted from a. crieket in one minute. This 
relationship would be found ri'inarkably accurate, unltssh tlie tenqjcratuie 
was below 15'’. Wlu'n the* weather i.s coldm* than 45^ crickets do not 
chirp. Likewise, it, migld, not he accuralt' appreciably beyond 75*^, since 
observations have not iieen made biivond mat temperature, and ^xc do not 
know, th(n’('fore, if the relationship holds for higher temperatures. 

The relationsliip between these two variahle.s - -temperature and cricket 
cliirps is displayed in Chart 19. 1, known as a scatter diagram. k]ach dot 
represents an observation of one cricket. Thus, observation A represents 
a (‘ricket which, at a tempt'ratiire of 59.0", chirped S5 times per miiuite. 

* Karl Pearson The ijrainmar of Science, p. 77. Adam and C'harles Hiack, Lond«>n, 

1900. 
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The reader should notice that temperature is plotted along the X-axis, 
while chirps per minute are plotted along the F-axis. This is done 
because the number of chirps per minute appears to be a direct result of 
the temperature. In this case it is also true that we wish to estimate the 
number of chirps to be expected at a given temperature; temperature is 
therefore the independent variable, and chirps per minute the dependent 

CHIRPS 
PER MINUTE 


14-0 


I 20 


JOO 


60 


60 


40 


0 45 60 55 60 65 70 75 60 

TEMPERATURE; DEGREES FAHRENHEIT 

Chari 19.1. Teniperalurc and Chirps per Mintilc of 115 
Crickets. Data provided by Mr. Bert. Holmes. 

variable. Even though it were temperature we wished to estimate, it 
would nevertheless be best to show the causal factor on the X-axis. 
When the causal relationship is not clear, or whcui neither factor can be 
said to be the cause of the other, then the variable to be estimated should 
be plotted on the F-axis. 

Judging from C’hart 19.1, wc see that the relationship between the two 
variables is linear, for the straight line appears to be as good a fit as a 
more complicated curve. The equation of this line^ is 

Y, = -137.22 + 3.777X. 

* This equation was fitted by the authors to data furnished by Bert E. Holmes. See 
also Bert K. Holmes, “Vocal Thermometers,*' The Scientific Monthly, Vol, XXV, Sep- 
tember 1927, pp. 261-264. 
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From this equation, estimates of chirps can be made for any desired tem- 
perature within the limits of the observations shown on the chart. Thus, 
if we wish to estimate the number of chirps when the temperature is 59.0° 
(observation A), we find the number by substituting 59.0 for X in the 
equations. Thus 

Yc = -137.22 + (3.777)(59.0) - 80 chirps. 

The estimate could be read, although less accurately, directly from the 
estimating line plotted on the chart. Although the estimate (86) does not 



Chart. 19.2, A Scatter Diaf^ram 
Illustrating Perfect Linear Corre- 
lation. The correlation would also 
be perfect if the line on which the dots 
lie had a negative, instead of a positive, 
slope. From F. E. Croxton, Elementary 
Statistics with A p plications in M edict ne^ 
Prentice-Hall, Inc., New York, 1958, 

p 112. 
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Chart 19.3. A Scatter Diagram 
Illustrating No Correlation, Var- 
ious other arrangements of dots are 
possible which will also show no cor- 
relation. From same source as Chart 
19.2. 


agree perfectly with the actual observation of 85 chirps, the discrepancy 
is not large. 

We cannot fail to be impressed with the adequacy of the generalization 
expressed in the equation Yc = —137.22 + 3.777X. Since most of the 
dots are very close to the line, it appears that frequency of chirps has 
been adequately explained by reference to temperature. The slight 
variations from the estimating line are unexplained and may be due to 
differences between individual crickets, differences associated with the 
time of day or year in which the observations were made, humidity, and 
inaccuracies of observation of temperature or number of chirps. Also, 
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the temperature at the spot where the erieket is chirping may he different 
from that when* the observer is standing. This miglit he the case if the 
cricket were under a stone. An examination of otlun* causes of variation, 
in addition to temperature, involves consideration of throe or more vari- 
ables, a procedure for which will he consi(iored in Chapter 21 under the 
heading of ‘‘Multiple Correlation.^^ 

The closeness of the relationshii) may be expressed in gen(u*al terms l)y 
stating that the voclh ient of CArrn hUinn, is Since ;]: 1 .0 is pen*" 

feet correlation (see Chart 10.2) and 0 is no correlation Chart 10.3), 
it should bo obvious that one almost never linds a higher coeflipient 
than +0.0010. The plus sign indicate-, that the correlation is p(>sitive - 
that is, that the cliirps ini'rease as Uie temporal nn* iieTt'ases Had chirps 
decreased with increasing temperature, the (airndalion would havi' lieeii 
m^gative, or inverse; the sign of r would have ])een lu'gativa', as waiuld the 
sign of b in the estimating e(|uation; and tlio estimating line would have 
sloped downward to the right. 

An illustration of rather low corr(‘ln{ion ( 1 1) is given by (+ort ith4. 
In this case, brain weight was (‘stiinateil by <‘raiiial capacity, and l(*gisla- 
tivo ability by a rather cumpli<*aied sysKan of scoring. T^nt e\en if w’e 
assume that all measurements are accurate, the ovidcana' I'eatainly <loes 
not suggest that legislators should be selcx'lcd soh'ly tVom fiead nu'asure- 
ments. Perliaps tlau’e are additional factors wdiich account for Ir'gislative 
ability; for exampl(\ intelli, thence, education, initiative’, homssty, social 
awarenes,s, and otlnu' traits uvo doubtless im])ortant. 

CORRELATION THEORY 

Correlation may bo thought of as involving thrc'o types of measure- 
ments, wdiicli may convoiiienlly la* made in tin' following order: 

(1) An vsiiwating, or regression f (gnolion w'hu'li <l(\scribcs th(‘ functional 

relationship between tin* two variables. .Vs the iiarne indicates, one 
oliject of such ail ofpiation is to mak(‘ <\stimales of one variable from 
another. _ 

(2) A mea.snre of the div(‘rg(‘n(‘e. of the actual values of the dependent 
variable from their estimated or coin])ut(‘d values. This measure 
is analogous to a standard deviaiiou and gives an ideal, in absolute terms, 
of the dcpefuiahilitij of estimates. It is calk’d tin* standard error of estimate 

(3) A mc’asure of the degree of relationship, or correlation (r), betw^een 

The tonii “ rogre-^jjou ” (‘nlorod stjUintifail litcnitnro us a rosult of llio use of corrdu- 
tioa by CJalton to study biological rcgirs'sion (that Is, the Icndoncv to rcvcTt (o a coni- 
nioii type or averuKC). Since corn»Iution analysis is applied to many types of proh- 
Icms, tlic term “estimating'' scem.s more appropriate. 
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the variables, indc'pendent of the units or terms in which they were origi- 
nally expn‘sse<b d'he scjuarc of thi.s measure (r^) enables us to state 
the relaUve ainounl. of variation in the dependent variable which has been 
explained by the (‘slinuding (‘fjuatiou. 

The esLiinar in;:: e<i(ia{ion. Foresters sometimes find it convenient 
to ostiinale the; liei^^hi growiii ot tn^-cs from their grow th in diameter, since 

legislative: 
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(lliarl L9.t. I'.sli nialfs of Bruin and Kc^islulivc 

Ability of ft9 of (loiif^re&s. Datn from “ Bram Weight 

and Legislative \Lilir\ m Congiehi^," by Arthur MacDonald, Con- 
fp-rs.^j.inal liciord, Aj|>ril 12, 1932. 

this proeoduro is (iiiic’ker tiian direet moasiiremenrs of the growth in 
height, ddie scatter diaginin, Chart H).5, sliows the breast-high diameter 
growth and (he growth in hcaght of 20 tree's, together witli the estimtiting 
line which describes the iiatuie of the relationship bclween the two vari- 
ables. This straight line has bt'en so iitted lliai. the sum of the squares of 
the T deviations from it is less than 1\h)>.c from any other straiglit line. 
A curve fitted in this manner i.‘- iisnally ( onsidered by statisticians to be 
the best with wliieh to estimate values of one variable when values of the 
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other variable are known. The fitting of such a line is similar to the 
fitting of a trend, and requires the use of the following normal equations: 

I. SK = iVa + bXX. 

11. 2Xr = aSX + b2X^. 

It will be remembered that the normal cq\iatioiis were ditseusseci in 
Chapter 12. 


HEIGHT GROWTH 
IN FEET 



DIAMETER GROWTH 
IN INCHES 


Chart 19.5. Breast-High Diameter Growth and 
Height Growth of 20 Forest Trees. Data of Tab!*? 10. !. 

Table 19.1 shows the computations that are necessary to tletermine the 
values which must be substituted. The substitution yields: 

I. 173 = 20o + 90.76. 

II. 856.0 - 90.7o + 453.936. 

Multiplication of all the items in Equation I by 4.535 permits us to cancel 
out a by subtracting Equation I from Equation II. Thus 

II, 856.0 - IK) 7a -f 453.936. 

(I X 4.535). 7^1.555 = W.7a + -411.32456. 

71.445 = ~ ’42.60556. 

0 = 1.676896. 
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We may now substitute the value of b in Eciuation I in order to find the 
value of a. 

I. 173 - 20a 4- 152.094467. 
a = 1.045277. 


TABLE 19.1 


Deter mi tuition of Values Vsed in Cornpntinfi Estimating Eqtia^ 
lion for Croivth in Diameter anfl (leight of 20 Forest Trees 


Rank in 

Diameter 

1 Height 1 





diameter 

growth at 

1 growth ! 





Rrowth 

lireast 

I lO ' 

XY 


yz 

(ainalleat 

in inelioH 

1 foot, ■ 





to Iar4;(‘.st) 

.V 

1 ■ 





1 

2 :3 

1 

IG 

1 

5 29 

49 

2 

2 r> 

: H : 

20 

0 

G 25 

04 

a 

2 () 

i 1 ! 

10 

4 

G 70 

10 

4 

;3 t 

4 ' 

12 

•1 

9.01 

16 

5 

:i 4 

i G ! 

20 

4 

1 1 5G 

36 

G 

:3 7 

, 

22 

2 

13. t>!) 

36 

7 

.‘3 9 

12 

ir* 

8 

I,") 21 

144 

8 

4 0 

H ' 

:32 

0 

1(5 00 

04 

0 

1 i 

i 5 ‘ 

20 

5 

IG HI 

25 

10 

4 1 


28 

7 

IG 81 1 

49 

11 ! 

4 2 

; 8 

:3:3 

G 

j 17. G4 

*G4 

12 I 

4 4 

I 7 

:50 

8 

1 19 :3G 

49 

1:3 1 

4.7 

i 0 , 

42 

:3 

22 09 

81 

14 

r) i 

10 

.11 

0 

, 20 01 

100 

15 

5 5 

i 1:3 

:i 

5 

! :30 25 

1G9 

IG 

5 8 

1 i 

40 

() 

1 I3.‘3 . 04 

49 

17 i 

! ti 2 

1 n ; 

G8 

2 

j 38 44 

121 

18 1 

1 G 9 

j 11 1 

75 

9 

1 47 G1 ! 

121 

19 1 

! G 9 

' IG i 

no 

\ 

I 17 01 

i 256 

20 1 

7 :3 

! * 

102 

2 

! 53 29 

196 

Total 

90 y 

1 17:3 

' 850* 

() 

•t.03 03 

1 . 705 


Data from Donald Briicp and l'\ X .Sf'bumarhi'r, Forest Mensuration, p. 124, 
McGraw-Hill Book C.onipony, Now York. First Edition. 1935. Courtesy of Publiahcr 
and Autborfl. 


The values for a and b are checked hy substitutinR in E(|iiation II. While 
this does not prove that no errors in computation liave been made, yet 
if the correct numbers were substituted in the two normal equations, 
either no errors, or counterbalancing errors, have been made. Since 
a = 1.045 and 6 = 1.077, the equation of the line wliich (mables us to 
estimate the growth in height of trees in this particular forest when their 
growth in diameter is known may be stated as 

Yc = 1.045 + l.r)77-Y. 

Suppose now we wish to estimate the height growth of a tree which 
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grew 5.5 inches in diameter. Substituting in the equation, we have 

Yc = 1.045 + (1,677) (5,5), 

= 10.208 feet. 

Dependability of estimates. However, wo slionld not expect ali 
trees which grew 5.5 inches in diameter to have grown exactly 10.208 feet 
in height, for the dots of the scatter diagram do not ali lie on th(‘ fitted line. 
Rather, 10.208 shiaild he thonglit of as an estimate of the average' lieight 
growdh of all trees of the diameder growth indicated. Wi* should expect 
variations from tliis value the same as from the arithnudic mean of a 
fre<iuency distribution. It is therefore ptudineiit to iiuiuire what propor- 
tion of trees may he expected to fall within any range of error in which 
we may he interested, assuming, of course, that we ha\c a re])ies(‘ntativ(’ 
sample. 

To do this, it is necessary to c()in])ut(' the standard cle\ i.ition of the Y 
values, not from their mean, hut. from the line of (‘stimation. On Ohait 
19.6, the vertical distance from the line of e^tiin.iie to any )’ value n'pr(’- 
sents the difference between the oh.served value and the estimated Y 
value. The estimated Y valuo.s, Ih, are oht/dned by solving the estimat- 
ing equation for each ne'asuFeimait of diaineti'r gr('»wtii. or .Y value. The 
deviation Y — Yc repnjsents the error that \\<Mild have been made in orie 
particular instance. I'o obtain a snmm.ary measiin^ of tlioso deviations, 
they may be squared. s\i’mmed, (hvid('d by A’, <an(l tin' stpiare nvd 
extracted. Tills is the r.landanl (rror of cat innitcY the symbol for wdiich is 
•'Jy.A'- Its formula may be written 

/2:o' - 

In this illustration 

sy n/VIss 2.107 feet. 

Calculations are shown in ddi}>le 19.2, Columns 7 and 10, Ordinarily the 
more e^xpeditious method of cai^'ulation, which is explained on page 408, 
would be used. Tlie above method is us(jd solely to explain tlu' meaning 
of the measure. 

This measure may be interpreted in a inaTiner strictly analogous to that 
of the standard deviation of a frequency distribution. It yields an 
estimate of the range above and below the line of estimation witliin which 

^ Although this measure is oalled the ‘‘standard error of eHtirnate,” it is not a stand- 
ard error in the sense u.sed in Clmpters 24 and 25. sy x is the standard deviation of 
the Y values around the estimating eriuation » a -f 
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G8.27 per cent of the items may be expected to fall if the scatter is normal. 
Ill practice we frequently think of this measure as the range within which 
about of the values will be found. For the case in hand (^SK,x = 2.107), 
wo may expect to fiud about § of the items of Chart 19.6 within the narrow 
band ±Sy.x shown in the diagram; about 95 per cent (ideally 95.45) 
within the wider hand that includes ±25r.A; and pra(;tically all within 
±3.‘?r.x (theoretically, with a large number of items, 99.73 per cent of the 

HEIGHT GROWTH 

IM FEET 



(lhart 19.6. KHliiiiiiliriK Kciualion ami Zonrs of ±1, 

+ 2, anti +3 Standard Krrors of FHliinatr, for Diame- 
ter (irowth and Height Gronlh of 20 For<‘si Trees. 

Data of Tabic 10.2, 

cases), A count of the dots shows that within ±s-,a: of the line of esti- 
mate, 13 of the 20 items (1)5 p('r cent) are found: within ±2 .S‘k a- of the line, 

19 of the items (95 per c(miO nppcair; and within ±3.sr,A' are ini'lnded all 

20 of the items. The slight discrepancies may have been due to the fact 
that the sample was .small and the scatter not normally distributed 
around the estimating e(]uation. 

Although the standard error of estimate is a measure of the dispersion 
of all of the Y valves around the estimating equation, and is therefore a 
general or over-all measure of dispersion, it is nevertheless often used to 
indicate the dependability of speeific estimates. It was calculated that 
tree.s with growth in diameter of 5.5 inches should average 10,268 feet 
in height growMi. We may now amplify the statement by saying that, 




TABLE 19.2 

Computation of Total Variation, Explained Variation, and Unexplained Variation, for Height Growth of 20 Forest Trees 

as Estimated by Their Diameter Growth 
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if our sample is representative, about | of such trees should vary in height 
growth between 8.16 feet and 12.38 feet (10.268 ± 2.107); or, considering 
a slightly wider range, about 95 out of 100 should lie between 6.05 feet 
and 14.48 feet. The proportion lying within any other range could 
readily be computed also by referring to Appendix E. 

These statements concerning range of error have to do, not with cer- 
tainty, but only with expectation. We have used only 20 items, and, 
even though the sample may have been carefully chosen, another sample 
of 20 would not give us precisely the same results as those obtained above. 
It might be that we could reduce uncertainty further, not only by increas- 
ing the size of our sample, but also by comparing variations in height 
growth with some other factor in addition to diameter growth — for 
example, age, since as trees grow older their ra^e of growth may change. 
Also, the character and quantity of plant food in the soil and the degree 
of cro'vding of the trees might be considered. Even if several factors in 
addition to diameter growth were considered (this is multiple correlation, 
discussed in Chapter 21), there would still he some unexplained variations, 
and therefore still some uncertainty. 

The correlolion coefficient and explained variation. Another 
measure eloscly related to the estimating equation and to the standard 
error of estimate, is the (‘oefficient of correlation r. The estimating equa- 
tion Yc— a +• bX is a statement of the way in which the dependent vari- 
able changes with variations in the independent variable. Sy.x is an 
indication of the amount of di.spcrsion in the dependent variable which we 
have failed to account for by our line of estimation, but it is stated in 
terms of the original data- in the case of the diameter-growth and height- 
growth data, in feet. When stating the degree of relationship between 
two variables, it is convenient to be able tc employ conci.se numerical 
terms which are independent of the units of the original data and to ex- 
press the degree of relationship between two series even if wc do not know 
either the equation of the line of estimation or Sy.x. To be sure, some- 
thing is lost by so compressing the information, since it does not enable 
us to make an estimate of the value of one variable from the other, or 
to tell, in absolute magnitude, the degree of accuracy of any estimate 
we may make. But something is gained, too, since one coefficient can be 
compared with any other, regardl -s of the subject matter of the different 
correlations. As has been stated, the coefficient of correlation is a num- 
ber varying from +1, through zero, to —1. The sign indicates whether 
the slope of the line of relationship is positive or negative, while the 
magnitude of the coefficient indicates the degree of association. When 
there is absolutely no relationship between the variables, r is 0. 

A clear understanding of the meaning of the coefficient of correlation 
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is given by the following approach. One measure of variability, called 
variation or total variation^ is the sum of the squares of the deviations of 
the Y values from their mean, 2)(F - F)^ This total variation can he 
broken up into two parts: (1) that which has been explained by our line of 
relationship, and (2) that which we have failed to explain. The total 
variation in height growth of the trees of our distribution, as indic;ated 
by the calculations in Column 8 of Table 19.2, is 208.55. The amount of 
variation which we have explained by our line of relationship is the sum 
of the squares of the deviations of the estimated Y values from their own 
mean (which is also the mean of the original Y values, as may be seen by 
dividing the totals of Columns 3 and 4 of Table 19.2 by that is, 
S(Fc — Y)^. The explained variation is shown in Column 9 of Table 
19.2 to be 119.81. The unexplained variation is the sum of the squares 
of the deviations of the Y values from their estimated values, 2(1^ — 
Fo)^ The unexplained variation is shown in Column 10 of Table 19.2 
to be 88.75. 

Let us summarize our findings: 


Variation 


St/mb(jl ajid formula 


Amount of 
vnriatum* 


l*er cent of 
total variation 


Unexplained Xyi = 2(1' — 88 75 42 6 

Explained Zyl = 2(1% — P)^ 119 (81 57.4 

Total 2?/* -= 2(r - F)2 208" '55 lOoT) 


• Because of roundinR in Table Ilf the two oomponoiits fllightb exi'ccci the total. Later it will be 
seen that 2v5 = S8 74. 


It will be seen that we have explainefl 57.4 per cent of the variation 
in the dependent variable. Expressed as a ratio to one. 0.574, this is the 
coefficient of determination^ The coefficient of correlation^ r, is the 
sf^uare root of the coefficient of determination and has a value of +0.758 
(the sign being the same as that of 5), and may be thought of as the 
square root of the proportion of the total variation in the dependent 
variable that has been explained by use of the estimating equation, r 
will, of course, always be larger tha?) unlcs.s r- = 0 or 1.0, when r ~ 
One outstanding advantage of the foregoing method of explaining the 
coefficient of dctermiiuition and the coelficient of correlation is that the 
concept will also serve to explain non-linear and multiple coefficients, 
which are discussed in Chapters 20 and 21. 

It may be helpful to some readers to be able to visualize the information 
of Table 19.2. Chart 19.7 shows, for the data of height and diameter 
growth: 

^ Sec Appendix S, section 19.1, Equation 2. 



A. The deviations of the ac- 
tual Y values from their 
mean. 

B. The deviations of the 
computed Y values from 
their mean. (N ote again 
that F, - F.) 

C. The deviations of the 
actual values from the 
computed Y valiK's. 

The proportion of variation 
which lias Ijccu explaiiu'd was 
0.574. The proportion whicii 
we failed to explain was 0.426. 
This is the coellicicnt of non- 
determination.^ Note that un- 
der ail conditions r” + “ =■ 1 .0. 
Note also that the maximum 
possible value for is I 0 
(when r is also 1 .0) ; this would 
occur if all of the dots of the 
scatter diagram were on the line 
of estimation, as in Chart 19.2. 
If no variation were explained, 

(and r) would he zero, since 
the estimating equation would 
coincide with 1', 

As can be seen from Table 
19.2, or from the summary of 
findings, total variation ecjuals 
explained variation plus unex- 
plained variation:^ 

= Xyl + Sy^ 

208.55 = 119.81 + 88.75. 

® While r- -f A ~ -- 10,? f /c > 
:tl 0 unless r -tl 0 0, k is called 

the coellicicnt ol alienation. 

^ For algebraic proof, see Appendix 
S, section 19.1, Equation 7. 

Chari 19.7. Total Deviations, 
Kxplninecl Deviations, and Ihie.x- 
plaincd Deviations for llcdglil 
Growth of 20 Forest 'frees ns 
r'xpluincfl by their Diameter 
Growth. Data of Table 19.2. 
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The C(iuatioii may also bo written 

= 2 :?;= - Zpl 

As computed in the i}rei‘eding paragraphs, 


.. _ 


but wc can also write^ 


r- - 


2/r 


i - 


V,,L*’ 


^ 8.75 
208 . oo 


which is the same value obtained before. 

It was mentioned parenthetically, on page 402 , that the sign of r is the 
same as the sign of h in the estimating equation. The sign of r can also 
be determined from inspection of the scatter diagram, unless the correla- 
tion is very low. The methods previously de.scribed for determining the 
value of or r were presented to explain the meaning'^ of the coefficients. 


* Taking the square root gives tho correlation coefficient: 


r 





V' 




Reference will be’^made to this last expression later in the chapter. 

* The correlation coefficient may also he explained in this manner: If the two 
variables X and Y are thouglit of as being composed of elements eiiually likely to he 
present in any item (some of wliich are common to A" and 1', but some of whic.h occur 
in the one and not the other), then the coefficient of determination of the entire 
population is tfie product of the two propoi lions of common elements, and the coeffi- 
cient of correlation i.s their geometric mean. Let us take 5 disks (oleincuts) marked 
on one side as follows (the other side being blank): 



If wc should throw all 5 disks into the air, when they fall, any number of A"’s from 0 to 4 
might appear, and also from 0 to Whenever an A appeal s, the chances that a 

Y will also appear on the same disk are 2 out of 4; likewise, w'honever a }' appears, the 
chances arc 2 out of 3 that an .V will appear on the same disk. If wc should throw 
these disks into the air a number of times, counting the X’s and K's each time, there 
would be correlation between the number of X’s that appear from throw to throw and 
the number of L's. The most likely value of r* is if X 5 =* 0.333, while the most 
likely value of r is \/if X S *= -fO.58. The larger the number of throws, the greater 
will be the tendency for r to approach this value. For a demonstration of this, sec 
F. E. Croxton and D. J. Cowden, Practical Business Btaiisiics^ Prcntice-Hall, Inc., 
New York, 1934 (first edition), pp. 41G-4i9. 
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They are too laborious to employ for day-to-day computations. Other 
formulas, more useful for purposes of calculation, will be given further 
on in this chapter. 

/The product-moment formula. The coefficient of correlation ma^ 
DC approached from a number of different points of view. As noted 
before, the explanation already given is particularly englightening, siiu^e 
essentially the same idea can be applied to curvilinear and to multiple 
(correlation. Hut the following explanation is also siinphi and. for certain 
purposes, extremely useful. 

Injhe estimating eciuation, U Udls us the normal amount })y which the 
dependent variabh^ changes witfi a change of one unit in the indeptmdent 


y 

variable. It is the slope or ~ ratio of anv ])oin+ on the estimating eciua- 

X 

tiou, when y and x are defined as deviations from the' mean of the seri(^s, 
so estimating eepjation becomes y^ -- hx. and b is obtaiiH'd by 

^xy 

finding'^ the value of -i;-;’ Although this constant h is essential fur 
lixr 


purpose's of estimation, still it cannot bfil us tl^' degree of relationship 
betsv(3en the variables, siru'e they are not directly comparable with each 
other. The X series and the Y senes do not have lh(' same dispersion, 
and they may even be in ditferent physical units. However, compara- 


bility between the terms of the ratio - can be obtained b,y dividing the 

X 

aiimcrator by Sy and the denorninator by Sx or by dividing the entire 


expre.s.sion by 




Sa- 


Thus, h is transformed into r ..s follows:^^ 


*lxy Sy _ ix.'/ Sa {Xxy^{sx) Xxy Xxi/ ^ 

'^'x' Sx 2x^ Sr XSxi^y XSxS\ 


See Appendix S, .section 19.2, 

Another way of getting tlie same re.snli Ls to tl)ink of as a spf'cial case of b; 
namely, when the oviginal data liave la'cn made ct)mparahle by expie.ssmg them in 
units of their own standard deviations. Thus, 


Xxy 


becomes 



6’A'Sr 


2:j:y 5^- ^ Zxy 

.s'A-'*‘y A^x NsxSy 


The fornmla is often staled a.s r " .s^ V reason for the adjective 

produrl-’tnoment be ",omes clear when it is realized that the word rtioment refers to the 
average of some power of the deviations from a mean. Thus, r is the first moment of 
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lu either of the last two forms, the ratio is known as the 'product-moment 
form of the coefficient of correlation. Thus it may be seen that r is 
merely the slope of the estimating equation when both numerator and 
denominator are in standard deviation units. 

Now, since 

r == 6 4 - — ; 


and 


h = 


Sy 

r - -y 
Sx 


Sy 

yc r :c. 
Sx 


Use of the ‘estimating eciuation in this form will be made later in the 
chapter.'* 


PRACTICAL METHODS OF COMPUTATION 

The previous illustration involved a limited number of paired items in 
order to illustrate the theory of correlation as eoneistdy as possible. In 
most practical problems, however, we have a large numl)er of pairs of 
items. In practice, therefore, it is advisable to modify the foregoing 
methods slightly in order to save time. 

As a preliminary step in a correlation problem, a scatter diagram 
should always b(v drawn. If only an approximate idea of the degree of 


the product of the variables when pach has been pn-vionsly stated in terms of its own 
standard deviation For proof that 


sec Appendix S, swtion 19. .3. 

No previous in<*nlion has ]>f*en nnule of the estiiiiatinfi ef|iiation A",. — a' b'Y, 
which minimizes the .squared lionzontal devi.atirms. For this equation, the normal 
equations arc: 

I. ZA = Na' + ^/Zr, 

II. ZXY a'Zr -F //ZKF 


In the form Xe 


h'y, b' 


. 

and Xc 


Sx 

r— y. 
Sr 


In the portions of this text dealing with linear correlation, we shall give exclusive 
attention to problems involving the estimating equation a -h bX. There are 

situations in winch the estimating equation Xc = n' -F b'Y is appropriate and still 
other situations calling for estimating equations differing from either of these. For a 
discussion, see ‘ One bine or Two,” by W. N. Jessop, Applied Stalistics, A Journal of 
the Royal Slafutical Society, Vol. I, No. 2, June 1952, pp. 131-137. Flight references 
are given at the end of this article. 
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relaCionship is required, inspection of the scatter plot yields satisfactory 
results. After a little experience in correlating, the statistician may be 
able to make surprisingly close estimates of r, by inspection, from the 
scatter diagram, and these may be good enough to help him to detect 
gross mistakes in computations of r. The scatter diagram may fre- 
quently be used for exploratory purposes and may occasionally yield 
sufficient information to eliminate the need for determining the coefficient 
of correlation. 

We have already seen that 


b 



Since the first normal equation i.s 


SF - Na i bXX, 

SF . 

—— ^ a + b ; and 


a •= F 


bX. 


From these cxpre.ssions, a and b may be obtained without solving the two 
normal equations simultaneously. We must, however, compute:'^ 



4 .. 535 . 



Ij-U - Z.YF - .T2F, 

= 85b.O - (4,5351(173) - 71.11.5. 
Zx- = s.r- - XZX, 

= 453.93 - (4.535)(9().7) =- 42.50.55. 
= SF^ - F2F, 


= 1,705 - (8.t)5)(173) =• 208.55. 


The last summation will be needed later. 
Then we obtain 


^ _ 7 1.445 
~ 4^6055 


1.676896; 


o = F - 5.Y = 8.65 - (1.676896) (4..535), 
= 1.045277, 


For proof of the expressions for the suramatioria, see fofitnote 3 in Chapter 21. 
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giving the estimating equation 

r. = 1.045 + 1.677X. 

Next we compute SI”’ by use of the expression** 

XV; = asr + 6Sxr, 

= (1.045277) (173) + (1.676896) (850.0), 

= 1 , 616 . 20 , 

and Xyl from 

Xyl =X]--~ XYl 

= 1,705 - 1,616.26 = 88.74. 

We may compute cither 

Xyl = aXV + hXXY - ?XY. 

= (1.045277)(173l + (1.6768961(856.0) - (8.65)(173), 
= 119.81, 
or 

Xyl = bXxy, 

= (1.676896) (7 1.115) = 119.81, 
and obtain S(/^ from the alternative expression 

_ V,,2 V,,2 

= 208.55 - 119.81 = 88.74. 

A convenient formula for obtaining .s,, y is 

2 _ i .a- 

" ',V - 20“ 

and 

Sy Y = 2. 10(> feet. 


The cocffi(‘i(‘nt of eorn'lution is then obtained iiy the nsnal expression 


and 


U9.81 
ti/ 2{)SM 


0.571, 


r = +0.758. 


Proof that ® a '^y + b^XV i.s given in Appendix aS, section 19.1, Equation Z 
Proof that Xy; == Ls given in the same section, Equation 5. For proof 

that Xyl ” bXxy^ sec Equation 6. For Tjroof that Xyl * Sy* — see Equation 

7. 
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If preferred, r may be obtained by use of one of the expressions given in 
footnote 8. 

If all that is wanted is the value of r, it is most expeditious to make use 
of a formula which does not call for the value of a or 6. It has previously 
been noted that 

“ YsxSr 

By substituting X — ^ for a: and Y — for y and simplifying, this 
becomes^® 

^ 

^ V[NtY - (SArKASi'"’'-- csvy] 

Entering the necessary values from Table 19.1 gives: 

_(20)(856.0) - (90.7)(173^ _ 

" a/[ '(20)14^’. 93) - T90 .7^211 (2'u) (VjOsF- Tl 73 ) ''] 

= 4-0.7»‘)8. 

Note that this expression automatically supplies the sign for r. 

SOIVIE CAITIONS 

Corrclalion and causation. The coefTicient of correlation must be 
thought of, not as something that proves eausalion, but only as a measure 
of co-variation. Any one of the following situations may, in fact, obtain; 

1. A variation in either variable may be caused [Iirectly or indirectly) by 
a variation in the other. The variable that is supposed to bo the cause of 
variations in the other is usually taken as the independent variable and 
plotted along the X-axis. 'J'hus» because dividends on stocks are thought 
to affect stock prices, rather than vice versa, a dividends^' series would 
be made the independent variable. It is a logi(;al process which deter- 
mines the statistician's belict that there is causal relationship between the 
two variables, and his belief as to which is cause and which is effect. It 
must be evident, then, that the coefficient of correlation in itself does not 
say that X causes F, any more than it says that Y « auscs X. 

For derivation of this expression, see Appendix S, section 19.4. Having oVitnined 
r by the expression above, it is possible to get the estimating equation and sy.x from 
the formulas used \vith correlation of grouped data: 

r, -r-r?i(A- -JP) 

Sx 

8r.x “ 8y - r*. 


and 
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2. Co-variation of the two variables may be dnc to a common cause or 

affecting each variable in the same way, or in opposite ways. If it 
should be found that there is correlation between automobile accidents 
per 1,000 persons and per capita federal income tax payments, it should 
not hastily be concluded that it takes an automobile accident to jar a 
person into paying his income tax; nor is it necessarily true that making 
large tax payments incapacitates a person for driving carefully. It is 
quite possible, ho.vever, that in states where the average income is high, 
the per capita income tax will be large, a large proportion of the people 
will own automobiles, and accidents will be numerous. 

3. The causal relationship between the two variables m,ay be a result of 
interdcpendiJit relationships. Thus, a high price for a commodity 
stimulates its production; but increased production may increase or 
decrease the cost of a commodity, depending upon the period of time 
under observation and whether it is an increasing- or decreasing-cost 
industry, and through the change in coKst the price will be affected. 

4. The correlation may be due to chance Even though there may be no 
relationship whatever between the variables in the universe from which 
the sample is drawn, it may be that enough of the paired variables that 
are selected may vary together, just by chance, to give a fair degree of 
correlation. Thus it might be found that, in a given group of male 
students, there \vas positive correlation between the size of their shoes 
and the amount of money'in their pockets. Yet it is hard to develop a 
theory as to why this should be so, and the chances are that another 
sample would yield quit/e different results. In Chapter 26 brief attention 
will be givxn to measurement of the reliability of r. 

Heterogeneity.^'^ In observational data, heterogeneity in a fre- 
quency distribution may often be spotted by bi-modality or the presence 
of a few items which are too far out of line with the other items to be 
considered a matter of chance. On the scatter diagram, wsuch hetero- 
geneity may show up as a tendency for the dots to cluster into two or more 
groups, Or for one or more dots to be far removed from the others on the 
chart. Where heterogeneity is observed, it is better to classify the data 
on some rational basis and correlate each group separately. Individual 
items clearly governed by a different set of causes should be eliminated 

In tlin foliowinR paragraphs the material dealing with heterogeneity is based on 
a"di 3 eus.sK>ii same rof)ic in F, K. Croxton and D. J, Cowden, Practical Business 

SfalLHlicH. Hv^-^ond LM.Uou, Prentice-IIall, liic., New York, 19*18, Chapter 14; and in F. E. 
tVoxton, ElemenUry Staiisiics v'llh Applications in Medicine, Prentice-Hall, Inc., New 
York, 1953, (’hapter Charts 19.8, 19.9, and 19.10 are also from the latter book. 
The treatment of errors of measurement, use of averages, non-linear relationship, 
and- elimination of relevant data is based on similar material in the former volume. 
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before correlating. If these common-sense steps are not taken, one may 
obtain a misleading impression, not only as to the degretj of correlation, 
but sometimes even as to its sign. 

Chart 19.8A is an illustrative scatter diagram showing low correlation. 
In Chart 19.8B, the two component groups are shown by means of differ- 
ent symbols, and it is seen that two fairly high correlations are present. 
It is also possible that twe different groups, each having little or no 



X VALUCS 


Chart 19.8 A. Illustrative Sea tier Chart 19. 8B. Same Scatter Dia- 

Diagram Sliowinp; Low Correlation: gram as in (^hart 19. 8A, But Indi- 

Two Dissimilar Croups INot Ideriti- eating Fairly High Correlation for 
fied. From E. Cro.vtoii, Elementary Each of* Two Dis'^imilor Groups, 
Statistics U'lth Applications m Medicine, Shown h. Oosses and Dots, From 
Prentice-Ilall, Inc., Now York, 195-'^, p. the same sonroe as (Uiart 10. 8A. 

128. 

correlation, could be so located on a scatter plot that, if they were com- 
bined, moderate positive (or negative) correlation would appear to be 
present. 

Another sort of heterogeneity is shown in Chart 19.9. There are nine 
clustered dots in Chart 19.9 which show low correlation, r = -f 0.32, and 
one dot far removed from the others. For all ten dots, r == +0.79. The 
presence of a single, almost certainly non-homogeneous (or, at least, 
non-comparable) observation sUvh as this may result in an even higher 
correlation coefficient when little or no correlation exists for the other 
observations. It is altogether possible that Chart 19.9 illustrates also 
the sort of heterogeneity mentioned in the preceding paragraph; the 
upper four dots of the cluster of nine may represent a category different 
from that represented by the lower five dots. In any event, the investi- 
gator should look into that possibility. 
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It should be fairly obvious that the reverse of the situation shown in 
Chart 19.9 might also occur. That is to say, a cluster of dots might show 
high correlation, but one extreme dot might bo so located that its inclusion 
with the others would result in low 
correlation. Chart 19.10 shows a 


situation in which a low correlation 
is made even lower through the 
inclusion of an extreme pair of 
values, r is decreased from 
+0.348 to +0.290. 

Flrrors of measure merit. 
Since errors in th(^ mcastirerncnt of 
the two variables are ordinarilv not 


y values 




X 


■rAuurs 


IQ or MALE TWIN 


Chart 19.9. Scatl€*r Diagram 
llluslrating a Type of ffetero- 
geneity. The. corrolation i.s 
because of the presence of an atypical 
item in the upper right corner. This 
chart is drawn from artual data, (he 
source and nature (»f winch are witli- 
held. Chart from page 129 of the 
souree given below ('hart 10. 8A. 


Chart 19.10. Scatter Diagram 11- 
lustruting a T>pc of Heterogeneity. 

The correlation is deeri*asod because of 
the presence of a possibly atypical item 
at the top of the chart. The data repre 
sent the of 2ti fraternal twins of 

urdike sex and are from A. II. Wingfield, 
Twins ami Orphans, J. M. Dent and 
Sons., lAd., London and Toronto, 1928, 
pp. r2L 12rb Chart from page 1] 1 of the 
reference given belo\v Chart 19.8A. 


correlated, such errors reduce the size of r below its true v^alue. Such 
aiUmuatioyi can be corrected if the magnitude of the errors is known. 

Use of averages. If the data to be correlated are first grouped into 
a number of size groups according to the independent variable, if X and 
Y are computed for each group, and if these means are correlated, the 


See J. P. Guilford, Fundamental Slafist rs in Payc.holoqy and Education, McGraw- 
Hill Book Co., Inc . New York, 1942, pp. 2S7 288. 
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correlation among the means will be higher than among the individual 
items taken as a whole (unless r = 1.0 for the ungrouped data). This is 
so because there is now no dispersion of the actual values around the 
various column means. Likewise, if the grouping and averaging is done 
for a number of rows of the dependent variable, the correlation will be 
increased. If the data are grouped afvording to both variables, so that 
there is a number of cells, and if X and are computed for each cell and 
these paired cell means (rather than tlieir mid-values) correlated, the 
(’orrelation will bo increased. The incn‘asc v/ill be unimportant provided 
there is a large* number of cells. As an illustration, the ('orrelalion of 
state averages will ordinarily be higbe^r than that of the county values. 

JVoii-linear relationship. If inspection of the scatter diagram 
reveals that a curved line eoiild more apt)r(^pria1 ly be fitted to the data 
than a straight lino, r is a misleading measure, understating the (‘losenoss 
of the relationship. curved lino sliould be fitted, and a coefficient of 

non-linear (correlation should b(» coiuput(‘d, following the procaxlurc 
explained in (’-haj)t(>r 20. So doing will yield a higher coefficient and one 
whi<di rcdlocts more; accurately (he closeness of the relationship. Some- 
times it may be betliT to trarisfoim one or both of the variables into loga- 
rithms, reeiprn(‘als, or sonn^ other function before (Xirrelating. 

EliiYiinatioii of relevant data. Tor instanee, if retail sales and pay- 
rolls are correlated for cities ranging from 100,000 to 500,000 population, 
th(^ correlation will usually nut be so high as if cities ivom 10.000 to 
5,000,000 are included. 'Ibis is so b(M*ause retail sales and payrolls arc 
both positiv(dy correlated witli population; and, when the range of values 
along both axes is oxi, ended. is increas(*d without a eurrosponding 

increase in For data of this type, one must remember to guard 

against heterogeneity of the type illustrated iti Cdiart 19.10. Consider 
also a different situation; if placcmumt scores were correlated with 
monthly earnings for workers having two to five years’ experience, the 
correlation miglit be higher than if all employees of this type were 
included, siiu’C earnings generally vary din'ctly with experience, while 
placement s(‘orcs are nut necessarily correlated positively with experience. 

CORRELATION OF GROUPED DATA 

When the number of pairs of items to be correlated is large, time is 
saved if the data are grouped before calculations are undertaken. First 
the data are tallied,^* as in Table 19.3, which shows the relationship 
between per cent of farms with value of produ(;ts sold of more than $4,000 

Sorting, instead of tallying, may bo easier and less subject to error. This is 
particularly true if the data arc on cards or if punch-card equipment is available. 
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and per cent of farms with automobiles, by counties. This table resem- 
bles a scatter diagram except that each point, instead of being plotted 
exactly, is merely entered in the appropriate cell. Thus, a county with 
6 per cent of farms having value of products sold of more than S4,000, and 
with 25 per cent of farms having automobiles, would be tallied in the 
extreme lower left corner. 

TABLE 19.3 

Tabulation of Per Cent of Farms with Value of Products Sold of More 
Than $i,000 and Per Cent of Farms with Aulonioblesj for a Sample 
of 169 Counties^ 1950 

(Sec text, below, for dosoription of population and method of sampling.) 
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36.0-63.9 
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72.0-79.9 
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86.0-93.9 


fmr fl*«t «r fcrwi vlth wlu* produots Beld over 9^,000 (Z), 


Data from Country Gentleman’s Farm Market Data Book. The hgutes arc based on tho 195U Census 
of Agriculture. 

The following states, and those south of them, were excluded from the 
population sampled: Oklahoma, Arkansas, Kentucky, West Virginia, and 
Virginia. It was believed that these states were affected by a system 
of causes different from the other states. For the same reason, the 
District of Columbia was also excluded, and all counties included in 
“Standard Metropolitan Areas'' ?>y the Bureau of the Census. The 
sample was obtained by the following procedure: The states were listed 
in alphabetical order, and also the counties within each state. The 
counties of all states (including those mentioned above) were then num- 
bered, from 1 through 3,070. A digit, selected at random, turned out to 
be 5, and all county numbers in the population being studied with a ter- 
minal digit of 5 were selected. If a county so selected was a metropolitan 
area, the county with the number closest to it was substituted. This was 
taken, arbitrarily, as a county with a terminal digit of 6 rather than 4 
where a choice had to be made. We thus have a 10 per cent systematic 
sample, stratified by states, with approximately proportionate repre- 
sentation of counties for each state. 

^ A more laborious, but slightly better, plan would be to use systematic sampling 
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Table 19.4 is a correlation table. The figures in the center of each cell 
are taken from Table 19.3. The /y values are obtained by adding the 
numbers horizontally ; the fx values, by adding vertically. These two 
sets of figures will be rcicognized as frerjuency distributions of the depend- 
ent and independent variables, respectively. The total frequencies, or 
counties /V, for each distribution are, of course, the same: 169. The three 
other columns and rows in the table are identical with those to which 
we are av^customed for cf^mptiting the mean and standard deviation from 
a fre(iuency distnl)ution, except that here we have two frequency distri- 
butions, one of the A" values (running horizontally) and another of the Y 
values (running verticnlly). For ease in computation, deviations are 
measured in terms of flass intervals from assumed means, that of X 
being chosen as 8 per cent and that of Y as 6.5 per cent. 

Since xy value.s ar»‘ re‘quired for r, these also are computed for each 
cell and totaled. This is done by multiplying the A’ deviation by the Y 
deviation (shown in the upper part of each cell), and finally multiplying 
this pnalmd by the appropriah' frequency. The results are shown in 
boldface type' in the lower part of each cell. It will be noticed that the 
first and thini ^juadrants are positive, while those in the second and fourth 
are, of cour>e, negative. The alg‘d)raic total of these products is shown 
in the low('r right-hand corner of the table. There is no subscript for / in 
the expression 2/Wydy, since each cell frequency is common to an X class 
and to a Y class. 

When corndating grouped data, it is most expeditious to compute r 
first, after which the estimating e<(uation and the standard error of 
estimate may he obtained,*’^’ 

To obtain r directly from ungrouped data, the following formula was 
used: 

__ _ NliXY -J2X)JSF) 

~ (2A'r][X2 n"- i'T] 


For grouped data, A" is replaced by d'^ and Y by dy, the symbol /is intro- 
duced, and the expresKsion becomes 




vlN±TMr -'i^Jx^xhwzfyidrr - (s/X)*i’ 


with probability proportionate to size. Size would be measured by the number of 
farms in a county. The variability in the number of farms per county is not sufficient, 
however, to make this more laborious procedure worth while. 

It is, of course, possible to set up the tw^o normal equations and obtain the esti- 
mating equation first. For the method of doing this, see the first edition of this text^ 
p. 675 and pp. 856-857. 





TABLE IM 

Correlation Table of Per Cent of Farms with Value of Products Sold of More Than 
$4f000 (X) and Per Cent of Farms with Automobiles (Y)ffor a Sample of 169 

Counties, 1950 



Dtu from 7abl« 19 3. 
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( 1 69 )( 451 ) - (- 48 )( 183 ) 

VkIgI))^ [’,T94)~-~(^48V][( 169) ({TSS)”- (183)^ 

85,003 

~ V{r99,48‘i)'(r27.^^' 

= +0.5322. 

The following measures are readily computed from values shown in Table 
19.4 by methods already familiar to the reader: 

X - 41.678. F - 77.738. 

sz - 21-144. .V - 13.: 54. 

To obt'^in the estimating liquation, we use 

Y, ^ T r~- (X - X). 

Substituting in this eipiation, we have 

Y, ~ 77.738 = 0.5322 (.IT - 41.678). or 

2 1 . 144 

Y, - 63.309 + 0.34(>2X. 

Now siiiee, as .shown in footnote 8, 

2 1 

sj- 

s’y ^^’id 

Sr. A' — Sr V I — 

Substituting gives; 

.sr..> == 13.754 Vl - (0.5322)2, 

= 11.66. 

Effect of grouping. The values obtained from the grouped data 
are not exactly the same as would 'uave been obtained had the computa- 
tions been based upon iiugrouped data. Although the difference is 
ordinarily slight if there are at least 12 groups in each direction, the coeffi- 
cient of correlation computed from the grouped data tends to be too small. 
It will be recalled that one formula for the correlation cuefhcient is 

'^xy 
NsxSr 


r 
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The errors from grouping tend to offset each other in the numerator, pro- 
vided the X and Y distributions are approximately symmetrical. How- 
ever, the standard deviations in the denominator tend to be too large, and 
Sheppard’s correction should be used if the conditions under which this 
correction is appropriate are met. 

If the 169 items are correlated, ungrouped r = +0.5499 which is, of 
course, higher than the value of r = +0.5322 for the grouped data of 

Table 19.4, If Sht^ppard’s correction is applied (by subtracting ~ from 

i Ji 

each expression enclosed in brackets in the formtila for r for grouped data), 
r is found to be +0.5404. Actually, the validity of the use of Sheppard^s 
correction for these data is open to doubt, since both senes are of limited 
range. 


CORRELATION OF RANKED DATA 

Sometimes statistical series art‘ composed of items the exact magnitude 
of which caiuiot be ascertained but whi(*h are rankl'd according to size. 
Thus, in Column 2 of Table 19.5, we have listed 11 basketball teams in 
order of their United Press rankings, as of March 2, 1953. In ('olumn 3 
wc have listed the same teams in order of their Associated Press rankings. 
The table includes all the teams that were ranked in the first 10 by either 
organization. The U.P. rankings were made on the basis of votes by 
basketball coaches, while -the A P. rankings resulted from preferential 
ballots submit^ted by sports writers and broadcasters. We vvish to deter- 
mine the extent of agreement among the two sets of authorities. 

Since the coefficient of correlation previously e.xplairied is not designed 
to deal with ranked data, we shall use Spcarman\'^ rani: correlation coeffi- 
cient y the formula for which is 

d’ 


in which D refers to the difference in rank between paired items in the two 
series. In Table 19.5, it will he s(*(ui that the sum of the positive differ- 
ences equals the sum of the negative differences, and thereby provides a 
check on the accuracy of the subtractions. Substituting the values in 
the formula, we have 


^ r&nk — 1 


(u)(i2r~ 1) 


+ 0 . 9 . 


The formula gives the sign of the correlation coefficient, positive in this 
case. Whenever there is a tie in lank, the two or more positions should 
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be split among the different items. Thus, had Seton Hall and W’ashing- 
ton tied for second and third in U.P. rankings, each would have beeii 
ranked 2.5; while if Seton Hall, Washington, and LaSalle had tied for 
second, third, and fourth, each would have received a rank of 3. 

Two paired series of values are sometimes (converted into ranks and Tr^nk 
computed to provide a quick estimate of r for the paired values. Foi 
instance, one might rank American League outfielders according to their 
batting averages and according to their fielding records and correlate 

TABLE 19.5 

Computation of f’alues for Correlation of Ranked Data: Basketball 
Team Rankings by Two yews Services^ March 2, 1953 


Team 
( 1 ) 

Tiidiiiny 
Seton Hall 
VVashin{j:toii 
LaS{ill(‘ 

Kansas 

Louisiana Stat(‘ 

(Oklahoma A & M 
North Carolina Slat(‘ 

Kansas State 
Illinois 

Western Kentucky 
Total. 

Data from Durham MorniriQ llvald. Mati'h 3, Soclion II. c S. North ('arolina State wa« 

aot'iallv ranked twrlfth in tl.o \ P lust hut Oklaliuma (not inch'' led iibovo), which was ranked 

eleventh hy A. P,, wa'> onlv i-it^hteontli on the U P. list For purposes of illustration. North Carolina 
Stato i.v show'n as clfi>cntli on the A, P. hat. 

these two sets of ranks. While may be computed more quickly than 
r, some time must always be spent in ranking the data. Also, it is well 
to remember that, if one wants only a rough estimate of the degree of 
correlation present, it may be had from a scatter diagram of the original 
values. 

The reason the rank method is not so aeeurnte as the ordinary method 
is that all of the information concerning the data is not utilized. Thus, 
the first differences of the values of C e items in a series arranged in order 
of magnitude are almost never constant; usually these diiTereintes become 
smaller toward the middle of the array. If such first differences were 
constant, then r and r„nk would give identical results. If the values, 
however, are distributed normally, there may be applied to a correc- 

For proof, see Aepciulix S, section 19.5. 
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.tion which will give the same result that would be obtained by comput- 
ing r directly. These corrections always serve to increase the correla- 
tion; however, they are very sniall, in no case increasing the correlation 
by so much as 0.02. Furthermore, the correction is not always appro- 
priate. In the present illustration, we have^ only the upper tails of 
(possibly) normal distributions; if plotted, the data might appear as 
reverse-*/ distributions. 


CORRELATION OF DATA IN 2 X 2 TABLES 

Data are often encountered which fall into a dichotomous classificatio!i 
on each axis. Sometimes a correlation coefficient may b(' desired-^ for 
such a “2 X 2'’ table. 

Table 19,6 shows data of the a(‘ademic rank and a(*ademic output of 
36 teachers in a department of a state university in lOnl, Is there cor- 
relation between academic rank and academic output, as shown by the 
data of Table 19.6? 

One method of obtaining a correlation coefficient for a 2 X 2 table 
consists of applying the product-moment forniula. If \vc designate the 
values in a 2 X 2 table tiius: 


' ' ft [ \ 

I (l? , ^ ,'i <l2 t* /'■> I 

L ‘k *i ! 

it may be shown^^ that the product-moment, formula becomes 

(iihn — a‘J)\ 

“h bi)(u2 'k b2)(u\ 4- -f- />2) 


For Table 19.6 wc obtain 

(10)(13) -- 0"))(8) _ 130 -10 90 

V08)(18)(1.'>)(‘21) V1()2,()(10 


Tables of eorrccted values of rr-mk ar(‘ given m some tc*\( hooks. for instance, 

R. E. Chadfiock, Prinaples and Methods of ilouglitoii Mifflin Company, 

Boston, 1925, j). 300 and Aiipcndix K 

Table 25.0 is a 2X2 tabU‘ for which a corri'lation cncHicient was not desired. 
However, the chi-square analysi.s di.^cus.scMl in ('hapti'r 25 could be applied to the 
data of Table 19.5. 

The formula givmi above results from a .sim|jlification of the numerator of the 
expression developed in tl. (I. Yule and M G. KeTuiall, An /ntrodtiction to the Theory 
of Statistics, Charles Griffin arul Go , Lomlo! , 1910 (I2th cd. revised), pp. 252-253. 
The development assumes that only two val,.«'.s are possibh' for i^aoh variable. This 
is true of both variabh^s in Table 25.0 In Table 19 0 it is true of academic rank, 
since the two categories may be thought of as “full profes.sor” and “not full pro- 
fessor.” It is not true of acadiTuic output. 
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This expression will not yield a meaningful sign for r unless the two 
dichotomies are arranged as in Table 19.6, or unless both dichotomies are 
reversed; reversing only one changes the sign. 

TABLE 19.6 


Academic Rank and Acatlernic Output of 36 
Teachers in a Department of a State 
University f 1951 


Acadoinic 

1 Academic output I 

Total 

rank 

! High 

lX)W 

High .. . 

. - ! 10 

8 

18 

liflW 

. 1 , b 

^ 


Total 


21 

:ic> 


i.ink was for full prolessors. “low " 

fi)i all Dthi^r Kiades. Arudenno output was n.» aaured by a 
"iMsKMti t)f potuts for paeh of a nutuber of uPtiMtif-s, such 
ai books written, artuleq written. i»ai>erji read, and po forth 

Another method of correlating data in a 2 X 2 table involves com- 
puting the coefflciunl of mean .vpiarc cortf iiKjrncij, C. This is computed 
from the expression^* 

^ / ___ {atbi - biaiY ^ 

\l(ni -f 6)Hu; + bi){ax + ai)(bi + hj)] -f {aM - biaj)- 


which gives, for our illustration 
(' = 


(dOKi.t) - (r,)^8)P 

[ (IS) ( 1 8) ( 1 o) (21 )]' + 1 { 1 0) ( 13) (o) ( sTp 


=- / 


f 


8,100 


\' 102,000 + 8,100 


; = ^0.07302!) = 0.271. 


'rhe computations do not automatically provide a sign for (\ but a sign 
may often ho supplied from examination of the data In this case, it 
would be positive. 

On(‘ advantagt' of the eoeflicient of mean stjuare contingency is that its 
use is not limited to 2 X 2 tables. It may be used for larger tables, the 
formula for C being that given in footnote 25. 


This is a niodtfu’nLio/j of the usual expression 

c = 

^jM T X* 

which makes it unnecessary to compute x* for 2x2 tables. Chi-S(juare is discussed 
in Chapter 25. tables larger than 2 X 2, the usual expression would be used. 
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A disadvantage of C is the fact that C does not have a maximum value 
of 1.0. Its maximum value is less than 1.0; for example, it is 0.707 for a 
2X2 table, 0.816 for a 3 X 3 table, and 0.949 for a 10 X 10 table. For 
a table having the same number of columns that it has rows, the maximum 
value of C may be had from : 

/number of columns (or rows) — 1 
’ number of columns (or rows) 

Corrections"* may be made for this shortcoming of C, but they are not 
wholly satisfactory. 

Various other methods of correlating data in 2 X 2 tables are avail- 
able. Among these are: tetrachoric correlation,^^ the method of unlike 
signs,** the cosine ir method,** and the method of concurrent deviations.*® 

“ See C. C. Peters and W. R. Van Voorhis, Statistical Procedures and Their Mathe- 
jnatical Bases, McGraw-Hill Book Co., Inc., New York, 1940, pp. 393-390. 

See R. Ferber, Statistical Techniques in Market Research, McGraw-Hill Book Co., 
Inc., New York, 1949, iip, 343-344. 

” See the first edition of this text, pp. 688-080. 

See H. 0. Rugg. Stahslical ^fethods Applied to Education, Houghton Mifflin 
Co., Boston, 1917, pp. 294-297. 

^<>See H. Secrist, An Introduction to Statistical Methods, The Macmillan Co., New 
York, 1933, pp. 430-432. 



Symbols lKse€i in Chapter 20 


: value of 1% '\vhoji .V - 0 jn the (*stimating efiuations Yr = a + hX^ 
Vc = a + hX -I- cX-, and >V a -j- f‘X 4- r.V" + dX'^; value of (V F)c 
when A" 0 in the eJsMinatint!; equation (>/ Y)^ a + bX ; value of 



when — 0 in the e.stine-itmg ef|uaii()n 


( ’ )_ = , + 6.Y. 


Log a i« the \'al\ie of (log Y),, Vvluni .Y — 0 iU the estimating equation 
(log y)c ~ log a + A^ log h an<l wlien X ~ I in the estimating e(|ua~ 
Y)^ - log a lo^ X. 

h: />, or log />, is a eonstant in the various e.^timating equations mentioned 
above for a, 

c: a eonstant in the estimating cciuationa >4 ■= a + }>X + cX^ and Ye = 
11 -f- bX -j“ caV“ + dX‘K 

d: a eonstant in the estimating e(iuation a + bX + cX^ -f dX®. 

rj: lower-oaso (Ireek eta, the eorrelalmn ratio. 

/c: the riuml)er of eoluinns in a correlation tahhi 

iV: the niimlier of items in a sain])l(‘. In two-'/ariable linear or non- 
linear eoirelation, X is the number of pairs of items. 

jV, : the number of items in a eolumn in a correlation table. 

12: upper-case (Jreek omega,, used to identify a . olunin in the Doolittle 
forward solution, in which the first entries in each section are XY, 
XXY, and so on. 

r‘y x- coefficient of determination for X and Y. 

^y.AA?- coefficient of determination^for A’ and the estimating ecpiation 
= a + hX + cX^ having been used. 

coefficient of determination tor X and F, the estimating equation 
Yc =- a + bX + cA"‘ 4“ dX-^ having been used. 

nieasuro of (1) the increased variation attributable to the use of 
X^, expressed as a ratio of (2) t»'e aimaint of variation unexplained by 
Ihe use of X alone. See the coi fficient of partial determination, 
explained in Chapter 21. 

ogr.x* coefficient of determination fm* X and log F. 

Igr.iogA- coefficient of determination for log X and log Y. 

\ : coefficieiF. of determination for A' and 

pX Y 
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coefficient of determination for X and V^F. 
standard error of estimacc for tho estimating equation Yc ~ a + bX. 
Sr.A'A'i: standard error of estimate for the estimating eciualion = a + 
bX + cX\ 

Sy.xx^x*: standard error of estimate for the estimating C(iualioa Yc — c/ + 
bX + cX^ + dXK 

Siogr.A: standard error of estimate for the estimating equation (log Y)^ - 
log a + X log 

^logiMoK-v: standard error of estimate for tiie estimating equation (log )")c 
= log a + log A". 

6’i : standard error of estimate for the estimating eejuatiou 

yX 

a + bX, 

*^VTa'* error of estimate for the estiniaiintz ufiualion (\/ )')r - 

Cl “i~ b \ . 

S: upper-case Greek sigma, meaning ‘"^ake the sum of.” 

X; 

2: a summation over the k eolumns in a corn'lation laMe. 

1 

N. 

2: a summation over the Ah items in a column in a correlation table. 

1 



2?/*: total variation of the Y valves, 

2(log //)-: total variation (?f the log Y values. Heo footnotes 8 and 9, 


: total Variation of the valut's. See f 


footnote 18. 


total variation of the \^Y values. wSce footnote 10. 

2/yc- explained variation for the (‘stimating e(iua{ion )\ " a H- bX. 

^vlr xx^‘ explained variation for the e.stimating ('(luation Yc, --- a 4- bX 

+ eXh 

2i/cr.A'A'*A*- explained variation for the estimating e(juatiou Yc = a + bX 


+ cA- + dXh 


2(log y)l: explained variation for the e.stiiuating evjuation (log Y)c =* 
log a + log A' or foi the estimating capiatioi! (log )')r - log a + 
X log b. See footnote 9. 




explained variation for the e.stimating C(juatio! 


(a- 


'<2 4* hX, 


SeV'j/)^ explained variation for tlie estimating equation'( Vy)^ = a 


+ bX. 

’Zy]: unexplained variation for tbe estimating equation Kc,-- a -f- hX. 
Zyli- XX''- unexplained variation for the estimating c(}uation Vc - a + 
bX + cX\ 
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^vIy.xx^x*'^ unexplained variation for the estimating equation = a -f 

bX + cX^ + dX\ 

S(Iog2/0^ unexplained variation for the estimating equation (log y)c = 
log a + 6 log X or for the estimating equation (log Y)c = log a + 
X log 6. See footnote 9, 



: unexplained variation for the estimating equation 



= o 


-i- fcA. 

2('V^7/);: imexplaiiied variation for the estimating equation (V^F)c = 
(1 hA . 

X : the X series, also an observed value in the X series. Thus, we refer 
to eorrelating X and F, but XX means the values in the A" 

series.” 

y: see Zv-; y ~ F — F. 

y : see Z?/“ and Xy; with various additional subscripts. In general, ijr 
(with or without additional sub.''n'ipts) is the difierenre betw'een the 
aiiproprlate computnl or computed transformed F, \'alue an. i the 
corro.ponding aritlimetic m»:an. 

7/,: see Xy; and Xifj witli various -additional sub-^eripts. In general, y, 
(wdth or without additional snb.^auapts) is the differerua) between an 
observed F, or transformed ohseT’ved F, value and the corrospoiuli*uz 
eompiited value. 

Y: the >' series, also an ol)S<'rved value in the Y serie.s. 'bhus, we refer 
to eorrt'lating X and Y, but XY menus “sum ihe \;i!ues in rhe Y 
sc*ri(‘s.” 

F : the arithmetic mean of the 1' vrdues, 

Yci when used in connection with the (*orreLU»on ratio, the 
mean of a column. .•yrubo' vus list'd in i/ij‘ipr»T to 

mean the arithraotio mean of tiie cninpule<l )' Vidvs but ii not so 
used in this (diapter.) 

log Y: the arithmetic mean of the log Y values. 



: the arithmetic mean of the values. 


V^F: the arithmetic mean of the F values. 
Yci a computed F value. 

(log F)c: a computed log Y value: 

: a computed ^ value: 

(V F)c; a computed V F value. 



CHAPTER 20 


Correlation II: Two- variable Non- 
linear Correlation 


The preceding chapter considered the simplest type of relationship 
between two variables: a constant, amount of increase in the dependent 
variable associated with a unit increases in the indcperjdcnt variable. Not 
always, however, is the linear hypothesis satisfactory. The data of 
diameter growth and lieight growth of the tna-s, shown in Chart 19. o. 
were adequately des(‘ribed by a linear estimating ecpiation. The rela- 
tionship between the diameter and tlie volume of trees is not linear, as 
may be seen in C9iart 20.1, whi<‘h presenP^ the dato of Tabh- 2t).l. As 
noted in the table, the volume (igures repr(‘sent r,fv.:-ten(h of the nuinher 
of board feet of lumber in a tn^e. The 20 pairs of value's are for pomlf'rosa 
pine trees selected at random from a Tree xMcasurernent Hook from the 
Coconino National Forest ia Arizona. 

POI.VNOMIALS 

Second-degree curve. To deserib<* thf^ M'lationship between diam- 
eter and volume, wc shall first employ an e.stimating e(jUalion of the typo 

}\ - a -h fjX + 

and compare our results with those obtained when using a straight line. 
After considering an estimating (Hpiation of the typo 

a + bX T ^ 

for a ditTerent set of iiinstiati ve ^lata, we shall ndurn 1o the data of 
diameter and v(»lum(‘ of ponderosa pine trees and f'xamine several possible 
transformations of those, data 

Ff>r a second-degree curve, thrc'e normal equations are reijuired. They 

I. 2r - Na + b2X 4- cZX^; 

IF 2Xr = aSA" F + cSX^; 

III. ZX^V - aS.Y^ + eSA'^ 

486 


are: 
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VOLUME, BOAPO 
FEET + 10 



Chart 20.1. Diameter and Volume of Twenty Poiiderosa Pine Trees and 
Serond-Dcgree Estimating Equation, with Zones of il, ±2, and ±3 Stan<lar<l 
Errors of Eslimalc. Data of Table 20.1. Estimating equation shown by solid line. 

Sul)stit\iUiig the values obtained in Table 20.1, we have 

1. 2,460 = 20a 569?> + 17,437r; 

II. 83,777 = 569a + 17,4375 + 567, 749c; 

III. 2,949,733 = 17, 437a + 567,7496 + 19, 361,917c. 

In order to get the values of a, 6, and c, it is necessary to solve these 
three equations simultaneously. In describing one procedure for solving 
three simultaneous equations, we shall first state each step in general 
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TABLE 20,1 


Computation of Values Used for Determining Measures of Relationship 
Based on Straight-Line and on Second-Degree Curve for Diameter 
and Volume of Twenty Ponderosa Pine Trees 


Diain- 

Vol- 



1 1 

1 



eter at 

umo* 



1 1 




breast 

height 

(inches') 

X 

(board 

feet 

4- 10) 

XY 

X^Y 

1 j 

" 1 

; j 

i 

j 

1 

1 


Yi 

36 

192 

6,012 

248,832 

1 1.296 i 

4(> , 656 ' 

1 .670,616 

.36,864 

28 

113 

3,164 

88.592 

784 ; 

21,952 

f)l 1,1)56 

12,760 

28 

88 

2,464 

68,992 

! 784 J 

21 ,952 

611 ,656 

7 , 744 

41 

294 

12,054 

494,214 

! 1.681 i 

68,921 

2.825.761 

86,436 

19 

28 

532 

10,108 

i 361 ' 

6,859 

130,321 

781 

32 

123 

3,936 

125,952 

i 1,024 ‘ 

.32,768 

1 ,018,576 

15,12!i 

22 

51 

1,122 

24,684 

? 484 : 

10 618 

234 . 25(i 

2,601 

38 

252 

9.576 

363,888 

1,441 

51.872 

2,0S5.136 

6:i , 50 1 

25 1 

56 : 

1,400 i 

35,000 

; 625 ■ 

15,625 

390 , 625 

3,136 

17 1 

16 : 

272 ! 

4,624 

; 289 

4,013 

83,521 

25(i 

31 

141 i 

4,371 i 

135,501 

961 : 

20,791 ' 

923 . 521 

10,881 

20 

32 i 

640 1 

12,800 

400 ^ 

8,000 ' 

160, 000 

1,021 

25 

86 1 

2,150 ' 

53.750 

625 ; 

15,625 : 

390,625 

7.. 306 

19 

21 : 

399 1 

7,581 

! 36 1 i 

6,859 ' 

130.321 

411 

39 

231 ! 

9,009 

351.351 

i 1.521 1 

59,319 : 

2 313, 1)1 

53.3(;i 

3.3 

187 

6,171 : 

20:J.643 

. 1.089 : 

35.937 i 

1 , 185,921 

.31,960 

‘7 i 

22 

374 ; 

6,358 

: 2S9 j 

4,913 1 

h:<,521 { 

481 

37 

205 

7,585 : 

280,645 

j 1,369 

50,65;i 1 

1 87! . 161 ' 

12. 025 

23 ; 

57 

1,311 i 

30.153 

529 I 

12,167 1 

279,, S41 ; 

.3,210 

39 , 

265 

10,335 1 

403,065 

1 1,521 

50.310 1 

2,313 441 : 

70.22.5 

1 

1 

'2,460 H 

“83J77j 

l949,7.3:r 

! 17,437 1 

567,749 i 

19,361 ,917 ; 

462, 27S 


* V'oliime Wd8 aacert&ined V>y means of the ‘Smbner decimal f’” rule, which is deacnlitMl in 1). fiuiro 
and F. X. Schumacher, Forej^i yf^ni^nraUon, McGraw-Hill Book (‘o., Inc . New York, 1042, y'p l-’>0 lb3 
Data supplied by courte,sy of the Forest Service of the UnjtOfl States Department of \Krn ulture 
The figures are a random sample from a Tree Measurement Book from the Co«'ouino National I f'rest 
in Arizona. 


terms and then indicate the specific operation for this problem. The 
steps are: 

1. Multiply normal equation I by such a number that the coefficient of 
one unknown will become the same as the coefficient of the same 
unknown in normal equation II. For our data, normal Cfiuation I 
is multiplied by SX iV = 28.45 to yield 

(I X 28.45). 69,987 - 569a + 16,188.056 + 496,082.65c. 

2. Subtract modified equation / from //, or vice versa, to yield Kqua- 
tion A, which will contain two unknowns. For the present problem, 
Equation A will contain only 6 and c. , 
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II. 88,777 - 569a + 17,4376 + 567,749c. 

(I X 28.45). 09.987 - 569a + 16,188.056 + 49 6,082.65c. 

A. ]3,790 = 1,248^156^+^ 71,666.35c. 

3. Multiply uorm.'il ecpiatiou II by such a number that the coefficient 

of the unknown which is not in l^iuation A will be the same in II 
as in normal oqualioii III. In our problem, we multiply normal 
equation IT I)}” 2A' ~ 30.041901, obtaining 

fll X 80.<;4490j ') 

2,:)07,8 1.5.4 11 17, 137a h 584,350.7086 + 17,398,662.995c. 

4. Subtract modirnd (‘({nation 11 from 111, or vice versa, to get Eqiia- 
lion H. whi('h will contain tin* same two unknowns a.s Equation A. 
For our data, w'- have: 

III. 2,9i(h733 - 17, 187a + 507.7496 + 19, 361, 917c 

(If X 8n.01 1091 1. 

2.507,815 111 - 17,187a 4- 584,.350.‘^086 + 17, 3 98, 062 4)9 5r 
R, 8S2.8S7 589 - 88;i92'2926 + T^:F254^5 c 

5. Sol\(' haiU'dions A and simiilfaneonsly (tin' procedure w^as 
(l(‘sc]'ib('d on pages 208 209) (o obtain tln^ values of the two con- 
stants in thos(‘ efpia lions. Doing this for the data of diameter and 
volume of tin* in‘(*s gives: 

(> — "-5.020315; 
c - 4-0.2903063. 

0. Snb‘'t itntc. in any one of the normal (Hjuations, the vahn's computed 
in Step 5 in order to find (In' value of the unknow n which was not in 
iMpiations A and R, Fsing I, we havt' 

2,400 - 20a d- (560,H --5,620315) -f ( 1 7,437 ){0.29036G3). 

20a - 594.842: 
a - 20.7421. 

7. As a check, .sul^stitute the valiuvs obtained in Step.s 5 and 0 in a nor- 
mal equation jiot used in St' o 0. Employing hkiuation 11 gives 

83,777 = (r)69)(29.742l) - (17.437) (-5.620315) F (567,749) (0.2903663), 

= 83,770.9987. 

Tlie se(a>nd"d(‘gr('e e.(|uati<)n for estimating tree volume from diameter is 
]\ - 29 J - r).62A' + 0.2904 A' =. 
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This equation is shown on Chart 20.1 by a solid line. In view of the 
appearance of the scatter diagram and the estimating equation, the reader 
may be surprised that b has a negative sign. The reason is that Chart 
20.1 shows only part of the curve. If the chart were to be redrawn with 
a horizontal scale beginning at zero, the estimating cf^uation would be 
seen to be roughly IT-shaped. 

For a tree having a diameter of 30 inches, the estimated volume would 
be 

Yc = 297 ~ (5T>2)(30) + (0.2904)(30)^ 

= 122.1 tens of board feet. 

Total variation is computed i)y means ol the same expression that was 
used for linear correlation, 

- Fsr. 

= 402,278 -- (123) (2,400) - ir)9,0<)8. 

Since we have the values of a, b, and c, we can ascertain the evj>lained 
variation, which is^ 

+ h^xv + cS.v-}' -- Fsy, 

= (29.7421)(2,400) 4- (-5.6203 15) (83777) 

+ (0.2903003) (2,949,733) - (123) (2,460), 

- 156,235.5. 

We may now obtain l^yly.xx^ same manner as for linear eor- 

relation: 

^ylr.xx’ = - '^y'ir.xx-- 

= 159,098 - 150,235,5 •= 3,102.5. 

The standard error of e.stimate is 
l^y^xx^ 

Sy.xx^ = ^ 


= 'v/—-'-— — = 13.2 tens of board foot. 

> 20 

The zones of ±1, 2, and Si’r.xA*, around the estimating equation, are 
shown in (’hart 20.1 by broken lines. Estimates of volume, such as that 
made for a tree having a diameter of 30 inches, may be written ± 13.2. 
The coefficient of determination is, as before, the ratio of explained 


* Y.XX^ is a rather awkward Huhsetipl, but it indicates quite clearly that we are 
dealing w'itli measures computed in relation to an e.stimating equation employing the 
first and second powers of the independent variable. 



Chap. 20 ] TWO- VARIABLE NON-LIJNEAR CORRELATION 491 


variation to total variation 

y.2 

' r.A'x* 


The coefiicient of corroJation is the square root of this figure, 

r y'.A'A* ~ 0.989, 

but it has no sign. Th(‘ reason for th(‘ lack of a sign is that, when an 
estimating eciuation is curvilinear, the relationship between the two 
variables may be positive in one portion of the equation but negative in 
another portion. 

Comparison of results with those obtained from the use of a straight line. 
From the appearance of Chart 20.1, it is fjuite clear that*the relationship 
betv»ui^u die diameter and volume of the ponderosa pine trees is nom 
linear, and we shall see, in (hapter 26, thai the correlation resulting from 
the use of the second-degree curv'c is significjintly higher than that based 
upon a straight line. For the present, we arc interested only in com- 
paring the results just o])taiiied with those for a straight-line relationship. 
Using N and the appropriate summations from Table 20.1, the solution 
of the normal equations 

I. Sr - Na + hXX and 
II. SXF = +- 62.V2 

gives 

a - - 191.124274 and 
h - il,01127r». 

The straight-line estimating equation is 

F, = -191.1 -f 11.04X. 

This equation is shown, by means of a solid line, on Chart 20.2, and it is 
clear that a straight line is not a satisfactory des(u*ipiion of the relationship. 
Explained variation, from the straight line, is 

^yl = aSr + hi:XY - F2F, 

- (-191. 124274) (2,460) + C C04127o) (83,777) - (123)(2,460), 

- 152,259.2. 

Total variation ia 

Si/* = SF* - FSF, 

= 462,278 - (123) (2,460) =- 159,698, 


S?/- 

156,235.5 

159 ^ 698 ' 


0.978. 



CSAMETtR «N INCHED 


Chart 20.2. Diameter arul ^'oltl^no of i*o;ldero^a Hioe IVceh and 

Straight-line Kstirnatin^ hqnation^ ^^ith Zone> </f i: 1, „^2, and ±'> Slan4!ard 
Errors of Estimate. Data of Tab!*? 20.1. Estimating ion shown by solid lino, 

the same as for the secoiul-degroe curve, and 

2 0 V' *» V* ** 

lu = 

^ i:/J,G08 ~ l.V2,l>r)!).2 - 7.43S.8. 

The .standard error of estimate is 


iiY.X 




7,438.<S 


1?).3 tens of board feet, 
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a decidedly larger value than was obtained when the second-degree curve 
was used. The zori(‘s of ±1, 2, and Zsy are shown on Chart 20.2 by 
broken liiies. 

.Vs was \(y '{)(* evppeie.i, the linear oocfTieients of determination and 
eorrehuion sni:«ilei’" than Ibo-e based upon the second-degree ourv^^. 
They are: 



I b2, 2^0.2 

159,b9S 


0.953, and 


T'hird-degree curve* As an illustration of the third-degree curve, 
an'^ i '"idcnta]l\ . al.so of the law of diminishing returns, we shall use data 
derived frrnn e\[M*riineiit.s wdth nitrugt'P fertilizer and tobacco yield at 
'Pifton, ( unn gia, ( Ine thousand pounds of fertilizer per acre were applied 
to five different plots. Of the active ingredients, phosphoric acid and 
potash were held constant at 8 per cent and 5 per cent, respectively; and 
the nitrogen was made to vary as follows: none, 2 per cent, 3 per cent, 
i per cent, 3 per cent. Presumably the experiment was so conducted that 
differences in yield were not attributable to differences in soil fertility, 
<lrainage, and so fortli, between plots. The experiment was repeated in 
three diffeu'iu years. Of the total variation, what proportion can be 
explained by the varying amount of nitrogen used? While it is possible 
that the experinunit was not perfectly designed, the data indicate almost 
perfect correUitioii when the relationship is assumed to be of the type 


Y, = a + fcX + c-Y^ + 


* It is poHsihlo to set up a measure 

— 




which expresses (1) the iriorease in explained variation, attributable to the use of -Y*, 
as a ratio of (2) the amount of variation unexplained by u.sing X alone. Dividing 
the numerator and denominator of th above expression by allows us to write 


ryx^.x 


4.xx« - 
1 - 


This measure is strictly analogous to the coefficient of partial determination, dis- 
cussed in the next chapter. It will be referred to again in Chapter 20 when we under- 
take to ascertain whether the non-linear coefficient of determination is significantly 
larger than the linear coefficient. 
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This can be roughly verified by inspection of the scatter diagram, Chart 
20.3. The heavy horizontal lines are the average yields for each of the 
percentages of nitrogen which are given. These means are not necessary 
for the solution of the problem, but are useful in discovering the type of 
curve to fit. 


y»tLO 
IN POUNDS 



Chart 20.3, Per Cent Nitrogen in fertilizer and 
Yield Per Acre of Tobacco, at Tiflon, (pcorgin* 

Data of Table 20.2. The hon/OTital lines .show aver- 
age yield per acre for each pereinitage of nitrogon^ while 
the curve represerit.s values computed from the third- 
degree equation. 

Solution of normal equations. Since four constants must be found, four 
normal eciuations of the following type must be u.sed:^ 

I. SF = A^a + hSX + + dS.Y®; 

II. SYF = aSX + + cSY^ + d^X^; 

III. 2X2F = aSX2 + 6SX® + cSX^ + 

IV. 2X^Y = aSY» + + cXX^ + d2Y«. 


* Had three observations been taken for 1 per cent nitrogen, the origin could con- 
veniently have been taken at the mean of the .V values (2.5). Then the sum of the 
odd powers of X would have been zero, and would have disappeared from the normal 
equations. We should then have had two pairs of normal equations to solve simul- 
taneously : 

L 2r - ATa -heSY*; 

III. 2Y*r » a2Y* ^cZX\ 


II. XXY » 6SX* H- dXX*] 
IV. SY»r * -h dS-Y«. 
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The values required are computed in Table 20.2, and their substitutions 
result in the following normal equations: 

I. 16,934 = 15a -t- 426 + 162c + 672d; 

II. 50,630 = 42a + 1626 + 672c + 2,934d; 

III. 197,198 = 162a -t- 6726 + 2,934c + ]3,272d; 

IV. 822,884 = 672a -1- 2,9346 + 33,272c + 61,542d. 

Following our previous procedure, we may solve together Equations I and 
11; JI and HI; III and IV, in each case eliminating a. This gives three 
equations: 

A. 48,222 = 6666 + 3,276c -|- 15,786d; 

R. 80,256 = 1,9806 + 14.364c + 82,ll6d; 

C. 790,1.52 = 23,7246 + 178,416c H 1,0.51. 020d. 

Wo may now solve together A and B and then B and C, eliminating 6. 
The '-'nis are thus reduced to two: 

D. - 42,029,061 - 3.079,941c -f 23,432,976d; 

K. -339.402,381 = 12,492,l44c -f 132,899, 616d. 

Solvizig Equations D and 10 simultaneously, we lind that 


and 


d ^ -4.4648817 
c = 20.323899. 


By substituting these values in Eziuation .\, B, or C, we find that 

6 --- 78.263630. 

Substituting the values found for 6, c, and d in Kcpiation I, II. HI, or IV, 
we find 

a = S90..323S9. 

It is advisable to chock the value.s of d, c, h, and a at each .step, since any 
error made in the early stages will vitiate all subsequent computations. 
One method of checking is to calculate each of the constants twice, by 
.substituting in two different eciuations. Pos.sibly even better is to sub- 
stitute all of the constants known at any lime in one of the remaining 
equations. For instance, if the value of a has been found by substituting 
values of 6, c, and d in Equation 1 ' bnal check may be made bysubsti- 

tuting a, 6, c, and d in Erjuation IV. Thus, 

822,884 - 672(890.32389) + 2,934(78.263630) -f- 13,272(20.323899) 

+ 61,542(- 4.4648847) 

= 598,297.65 + 229,625.49 + 269,738.79 - 274,777.93 
= 822,8.84.00. 



TABLE 20.2 

Computation of Values Required to Obtain Measures of Relationship Between Per Cent Plitrogen in Fertiliser and Yield 

per Acre of Tobacco^ Tifton, Georgia 

(Fertiliser is 1,000 pounds per acre; PjOi and KtO are 8 per cent and 5 per cent, respectively. The yields on all plots were unusually high in 1»25; conse- 
quently, they were reduced by a factor which reduced their a^ erase to the average of 1924 and 1920.) 
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The estimating equation, then is 

Yc = 890.32 + 78.264X + 20.324X2 - 4.4649X*. 

Using this equation, the Yc values may be computed as follows* 


X 

c 4- bX 

1 j 

t/X’ 

i , 

j (pounds) 

0 

8^“ 32 

1 0 i 

0 

! 890 3 

1 

968.58 

20.32 1 

- 4.46 

1 984 4 

2 

1 ,046 85 

81. .30 j 

- 35 72 

I 1,092 4 

a 

1 1,125.11 

[ 182.92 1 

--120 56 

j 1,187 .5 

4 

1 1,203.37 

325.18 i 

— 285 76 

i 1.242 8 

5 

I 1,281.64 1 

508,10 ! 

-558 12 

i I ,231 f> 


If we omit the Yc value for X = 1 (since there is no ohservaiion for 
X = 1), sum the other Yc values, and multiply the result \)y 3 (since there 
were three observations for each .V value) wc obtain 16,933.8 pounds, 
which is in agreement with the XY value of Table 20.2. 

As ; 'r> })e seen from Chart 20.3, there is a point of inflection at about 
li per cent nitrogen, and the curve reaches a m*a\iinuin of nearly 1,250 
pounds shortly after the nitrogen reaches 4 per (‘cnt. These are, respec- 
tively, the points of diminishing marginal returns and diminishing total 
returns. How to locale these points more exa(‘tly is explained in Appen- 
dix S, section 20.1. 

Correlation coefficient and standard error of estimate. 'To compute 
need ^yrY.xx*xh and v ^ hesc are: 

St/cV.xx^A- --- T alTY^F - P^F, 

- (S9U.32389) (16,934) f (78.263630,) (50,630) 

T (20.323890)(197T98) + (-4.4648847) (822,884) 

» (l,128,93333)(16,934). 

= 255,624. 

Si/2 y^r, 

= 19,377,528 - (I,l28.93333)(i6,934), 

- 260,171. 

^VbY.xx^x* ” ^ 2 /“ “■ ^l/ry.xx^xh 

= 260,171 ~ 255,624 - 4,517 

From these we obtain 


^y.xx*x» 




.T*»X* 


^y'‘ 


2 .55,624 

260,171 


0.983. 


0.991. 


^ y.xx®x* 
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Doolittle method. It must be oonfcssed tliat, vvlieii tlieie are as many 
as four equations to solve simultaneously, the above procedure is some- 
what laborious. Furthermore, no check can be Mpi)lie<l until tlie value 
of d is obtained. Even that docs not check tlie accuracy of any work 
'^xcept the solution of the tw'o equations (D and E) neccssar>' to obtain 
d d. All of the preceding work could have been honeycombed with 
^ i jind still the solution of these two equations would check. It is 
ai roFc^il all of the constants are obtained that we have any leal check on 
l^.^^lccuracy of the solution of the four normal ecpiations. If the final 
^{leck failS; all of the work must be repeated. 

^ Fortunately there is available for solving equations of this type simul- 
taneously a systematic method that provides frecpieiit chocks on accu- 
racy and is less laborious than the above proccfluie when tlien? are four 
or more equations. It is known as the Dordiltle method, having been 
developed by M. JL Doolittle. Like many labor-saving devices in 
stati.stios, the method at fust seems very confusing. To a certain 
extent there is a substitution of complexity of proceduie for repetitive 
drudgery. 

The Doolittle metliod is illustrated l»y d’able 20.3. There aio five 
parts to this table: 

Part 1. Xormid eipiations. These are the same ecpiations th.at are 
found on page 495, but all of the teams have b(‘en put on the left side, so 
that each eipiation equals zeio. 

Part 2, Forward ^ollltion. This solution (d)tains a \alue f(;r d 
( -4. -HVISO JO, found in row IV', column il), and [)n)vi(lcs the tigures 
with whicli to obtain values for the other constanl.s. 

Part 3. Pack solution. In this j)art we compute by a simple process 
the values, in turn, ha c, 6, and a. 

Part 4. Estimating eijuation. Xote that this (‘quation agrees, to five 
digits, with tlie one iirevioiisly obtained. 

Ikait 5. Check equation. By suhstituting the values of tlie coii- 
stiiiits obtained in tlie last rioimal equation, the preceding work is 
checked. Tliis step involves nothing new'. 

The entries in the foiwvard solution arc the most confusing, but if the 
pro(*e(lure and explanation outlined bclow' are follow’ed very carefully, 
no trouble w ill be exiierienced in applying the Doolittle method to the 
solution of e((uatioiis of this type. It is desirable that work be done in 
penial first. This wall ])ermit some of the entries to be made in boldface, 
as indicatial in Talilo 20.3, merely by converting the pencil figures into 
ink. The stc})s in the forward solution are as follow's: 

1. Divide forward solution table into as many sections as there are 
normal equations. 1 iCave a spc.ce between sections, and sepaiatc also by 
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a horizontal line as shown. Allow in each section two more rows than 
the section number; except that section one requires only two itavs, 
rather than three. 

2. Label the (‘ohiiims: (1), (2), (3), (4), and Chcrk total. Five con- 
stants would re({iiire live normal equations, and therofort^ a column ('/ 
also. Enter also the descriptive matter in the stub, as shown in Table 

20.3. 

3. Record the apj)roj)riate normal-equation coefficients in the first 
row of eacl) section, beinj; suie to indicate minus sif^^ns. 

4. Total each normal e<|uation algebraically; recoid the results in tiie 
last coluiiin. 

5. Make the following entries in the last low of ('ach section: 

1.00000000 in row I' column (1 >; 

1.0000000 in row II' column (2) ; 

1.000000 in row III' cohin'n (3); 

1.00000 in row IV' column (4). 

The number of zeros after I. indicates the iniiumum number of decimal 
placcv^ 1,0 carry computations in each section. Tlie reasf)n for dro])])ing 
an additional decimal jilaee as computarlons proceed fiom ^cctutn to 
section is tliat errors fnuu rounding the figui'es cumulat(‘, and tlu‘ num- 
ber of signifi('ant jilaces becomes smaller. It is ad\ liable, howe\(*i-, 
never to recoid fewer than eight digits, including the rleciinal jilaces. 

(). Row' I' is the result of dividing row i by the numliia* in ('ell ^1(1) 
and changing signs. The sum of the first five eiitri(\'^ in fliis row should 
be checked against the (mtry in the total column, .md agrecnnciit indi- 
cated by a check maik. Values in columns (2), (3), ( 1)? ‘‘uid il of this 
row should be cuiteo'd in boldface, as fuith(*r us(‘ is to be mad(^ of them. 

(As suggest'd alnoe, this is m()>t easily done by reintoicing tlie original 
pencil entii('S with ink.) 

7. 'riie entrie.'- in Die si'cond row* of section II, c/hii li is lalxhvi wl X" 
l'(2), are a resuif of inultijilying the items in low 1)\ the mimhei (In 
lioldface) in the cell which Is an iiiteis(‘ctioii of luw I' and column (2). 

In similai fashion, inmicdiatelv below' each row of normal-ei juation 
coefficients aie found the coi r(\spondmg “poxluct" rows Tlies(‘ lows 
arc called itrodurt i-ows because they are the' lesult of making multiplica- 
tions, a descrijitiou of each such operation being gi\'en in the stub of the 
tal)le. It hcljis to keof) proc'css stiaight if obsei v(* tliat tin* multi- 
pliers are aUva^s the boldface mimluMS in thi' column beaiing the ^anu‘ 
parentliesized nuiiibei as the section b(‘ing conquited; and that the 
numbers multijilied are tlios(* in the row' imniediaf (dy ‘d')o\e tlu‘ boldfac*^ 
number in question. A clieck on tlie accuracy of the&« entries is jilVoMhal 
by totaling each row as it is con. ’’ited, ami indicating by a clieck maik 
agreement witli tlie entry in tlie total column. 

S. The third row of section IT, labeled ^11, is the la^sult of adding 
algebraically the two rows above it in that section. Likow isi' the w low 
in each section is a vertical summation of all the, entries aboxa* the 1' row 
in the section in question. There is no separate w row in section 1, siiHa* 
the section has no product row', and theiefore the normal-eciuation rosv 
automatically Iiecomcs also the « rovN . Note that, as the computations 
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proem! from section to section, ilierc is an increase in the number of 
spaces in this row that are left vacant because the entries have become 
zero. These ^ ro\ss also should be added horizontally to obtain a check 
with the total column. 

9. Kow II' is the result of dividinj; row by the value in 2:)II(2) and 
changing; si^^ns. 8o also each “prime” row (IIT, TV^', and so fortli) is 
(»bt:uncd by dividing each item in a given rov by the first entry in that 
row, with sign changeil. It is berause of this fact tliat tin* first entry is 
always —1. This entry is i^erhaps a sulliciont description to leiniinl us 
of the nature of tlie opoialion. The pnme rows should alsi) (dict-k with 
the total (‘oluiiin. After the ch(‘ck has Ireen made, enter the nunibcis to 
the right of each —1 in ink, up to. Imt not including, the total column 
entry. 

Tlic preceding (explanation has n.'feried s])0(‘i(ic:i]ly to tho steps in- 
volved in sections I and II. Hie otlun* st'ctions a»e computed in similar 
fashion, e.c ii sei'tiou requiiing the pieviou.'' ('onij)ut.itii»u of the other 
se(“fion‘^ The only variation among the difteient s(‘ctions lies in the 
number of pioduet rows and the number of vacant spaces to the left in 
some of the rows. As pievioiisly noted, we havtt obtained (ui coll IV' li) 
the \alue of <L which i> - 1.4r>.hs919. We are now ready to proceed] 
witli thol)ack soluti(*n to obtain n, />, and c. 

The hf.ick isoluhori oc^^'asions no ditheulty. It consists merely in sub- 
stituting tlie values of tlie constants, as obtairnnh in the deiiv(‘d c(]ua- 
tion^ III', 11', and I'. The entries in the 1 column aie tlu^ boldface 
items in C’olunin of the forward .solution tabh^. Tlie item in the la.^t 
row' of this colunm ( — 4.-bU'^9I9) i.sd. This value is ie»'orded in tlie last 
row’ of tlie total (‘oliirnn. The entiios in the d column are tie' boldface 
it(*m-> of Column (4), (drove, multiplied by - - t hi ISO 19 (th(‘ \ailue of d). 
The sum of the items in the thiul row i> r (TH)7()(K)2 ^ lihtrPIOlT 
20.d230.'M), which is enter(?d in the total column, opposite r. 4'he entries 
in the c column are the boldface items of (’olunin above, nniltipiieri 
by c. The sum of the items in the second row is h, d'he entry in the 
b column is the boldface entry in (A)iunm (2), alxrve, multi])lied by b. 
The sum of the items in the first row is a. It will be notj'‘ed tlrat, in 
using the hack solutum table, we record the column to th(‘ right first and 
then proceed to the left: and in the total column we j)roc(‘c(l fn^m bot- 
tom to top Procc('<ling in this fa^hlon is rather unusual, but most con- 
venient in this case. 

Tlie estimating erpiation arrivaxl at by the Doolittle method, 
y. - 890.32 -f 78.2C4A’’ + 20,32 DV‘ - 4.40 19A^ 

ngre('< with the erpiation fuevioiHly olitaineri on page 41*7. 

In the light-liand cf»himn of the Doohtth* hack solution talde is jiro- 
vided a convruilerit place for computation of tlie explained sum of scpuires 
by the e\[ne.^.'ion 

^ycv.xx^K^ -- f 4 rlTVD' 4 dZXU\ 

Note also that ^XY, and (with signs changed) aro 

found in Column il of tlie ^nnard solution table, tlH‘ first row of each 
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section, in that order from top to bottom; while a, 6, r, and d arc ar- 
ranged, in corresponding order, in the left-hand part of the back solu- 
tion table. The computations show that 

'^ytv.xx^x^ ^ 19,372,982, and 

- ^ylrxx^x^ “ 

19,372,982 - (1,128.93333)(H;,934), 

- 255,625. 

rSE OF TRANSFORMATIONS 

Instead of using a second-degree curve, or a curve of higher order, as an 
estimating equation, we may convert the readings for one or both variables 
into a different form, d'he rno.'^t frequently used transformations involve 
logarithms, reciprof-als, roots or powers, an : logarithms of logarithms. 
Frequently, a transformation will show a linear relationship between the 
twc converted series. We shall consider the use of logarithms, roote, and 
reciprocals for the data of diameter and volume of ponderosa pine trees 
which were used earlier in this chapter. First we shall examine the trans- 
formations graphically, Correlation analysis of the data will then be 
made for the transformations that appear most appropriate. The other 
transformations will be dealt with in symbolic terms oJily. 

Preliminary examination. Based upon our experience with the 
semi-logarithmic (‘hart in Chapter 5, it seems reasonable to think that the 
scatter diagram of Chart 20.1 might straighten out if we were to use a 
grid with a logarithmic vertical scale. In this event, we would use an 
estimating equation of the type^ 

(log Y)c = log a + A log b. 

Such a scatter diagram is showm in Chart 20.4, and it is clear that the 
relationship between log Y and is not linear. 

In Chart 20,5, the same data have been plotted on a grid having both 
vertical and horizontal logarithmic s(‘ales. This transformation calls for 
the use of an estimating equation of the type 

(log 10c " log a + 6 log A^. 

* The symbol (log !’)<, is used, rather than log 1^, to make clear that wc are dealing 
with ‘The computed value of log F,” not “the logarithm of the computed value of 
Y” For parallel reasons, use is made in the following paragraphs of {\/Y)c rather 

than ^/ Yc and (y) rather than ~ 
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VOLUME. BOAAD 
FEET 4 10 



DlAM£Tt« 'N 'NChES. 


(’hart 20.4. Diameter an<l >oIijnie of Twenty Poncleroeai Pine Trees Plotlerl 
on a Semi-logarithniie (pri<l. Data of Table 20. t 

'Fhp scatter (iia 2 ;rara of F’hart 2f).5 inclicalo.s that the relationship between 
log Y' an<l log A' is virtualiy linearA 

Anothf'r transfi^rination is po.ssibly more logical than either of the two 
alre.'idy tried. Since tiio volume of a cylimlm* is directly related to its 
length and to the square of the radius (or diarmder) of its circular crass 
section, it would si'cm rciisonable to try a traiisfurirmtion involving V Y 

* Occasion&Hy an fstiTruiiiiiji; equation of the type 

Yr a h log X 

is appropriate. For an illu.stration, se^‘ F. F. Croxton, Elementary Statistics With 
Applicnfiort.'i in Medinr,e, Prent ii’e-flall fne., New York, 10o3, pp. 152- 157. 
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VOLUME. BOARD 

FEET + 10 logarithmic SCALES 

400 
300 

200 


100 

60 

40 

30 

20 


10 

15 20 25 30 35 40 45 

•‘MAMET Eft !N <NCHE S 

<>hart IHamcler and Volume of Twenty Pomlerosa Pine 

Trees and KHtinintiiiis Kquation of Type {Lo|< V), =* lop « -f b lop X, 
with /ones of d: h and r.'l ^'taiidard F^ror^ of Kslimate, Shown 

on a I^purithiaie l.rid. Data of TaMt* 20, i IvsriiTjatinp equation 
shown by aolui lino. 

and A^. Of ('C)urHt\ a tree it’ not a cylinder,® hut Chart 20.0 sho^vs a scatter 
diagram whidi appears to bo more nearly linear than the preceding one. 
For this relationship, the estimating equation would be of the type* 

{\nic - a + bX, 

Although it is not reasonable to expect l!ta^ and X will produce a 

linear scatter diagram for these data. Chart 20.7 has nevertheless, been 
prepared. It is clear that this relationship is not suitable for these data, 
although it is sometimes useful for other series. The estimating equation 

* See page 234 of the second edition (1942) of the reference mentioned below Table 

20 . 1 . 

^ Sec note 4. 
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SQUADE ROOT OF 
(VOLUME + I0> 

fd 


15 


12 


9 


6 


14 20 26 32 36 44 

Diameter in inches 

Chart 20.6. Diameter and S<jiiare Root of Volume of Turfity PonderoHa 
Pine Tree#! and EHtimaliiifr Equation of Type (Vl^. — n + h-V, with />one« 
of ±1, ±2, and ±3 Standard Errors of Eptiniiile, Shown on an Arithinelie 
Grid. Data of Tabic 20.5. Kstiriiating equation .sliouui h\ solid line, A square 
root vertical scale could have been used for this chart. A prid using a S((UHrc root 
vertical scale and an aritliinetic horizontal &(*ale was not us(‘d here since jniper ruled 
in this manner is not readily available to the "reader. 7iie (Miually spaceil vertical 
scale values could be 0, I, 4, 9, lb, 25, and so on. 

would bo of the type^ 

(j).a + iX. 

The reader may have noticed that the grids used for Charts 20.4 and 
20.5 were so designed that the actual X values and V values were plotted. 



^ Sec note 4. 
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Charts 20.6 and 20.7 did not employ special grids, but used arithmetic 

scales, and the and ^ values were plotted against the X values. 

Special grids (‘ould have been used for Charts 20.6 and 20.7 ; they were not 
usetl because they are not readily available to the reader. 


reciprocal of 

(VOLUME + I0> 



tJuirf 20.7. Dinmcff^r an<I lUu-iprocal «f Voliiiiie of Twenty Ponderosa Pine 
Trees, Sliown on an Vrithmelie Grid. Data from Table 20.1, which does not 
bhow the rt'ciproeals of the )' values. 


We shall now proeoed to compute the various correlation measures for 
the log y, lop; A' relationship and for the VT, A” reiaiionship. The 

log Y, X relationship and the A”^ relationship will be considered in 

terras of symbols only. Because each of the four equation types which 
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are involved calls for but two unknowns in the estimating equation, all 
procedures will parallel those for linear correlation of ungrouped data as 
described in Chapter 19. The formulas will be the same as those pre- 
viously used, except that, (1) log T, V" Y, or -- will be substituted for F, 

and (2) log A' will b<? substituted for A^ when we use the log Y, log X 
relationship. 

Since the four transformations which will be considered involve the 
logarithms, square loots, or re(‘iprocals of the values, two points should 
be borne in mind: (1) the least-s(piares tit does not minimize the sura of 
the squares of the — Y^ values; it minimizes the sum of the squares 
of the deviations of the (ransfonned observed F values from the computed 
transformed Y values; and (2) when stating the amount of dispersion of the 
actual F values from the estimating equation, the standard error of esti- 
mate must be added to and subtracted from the computed F values when 
both are in terms of transformed units; after the addition and subtraction, 
the results may be re-cemverted to unitvS of the original F series. 

The log 1', log X relationship. Chart 20.5 indicated that the 
relationship between diameter and volume was nearly linear when both 
series were expressed in terms of logarithms, ^fhe estimating e(]uaiion 
is of the type 

flog -- log a h log A”, 

and the constants log a and b are obtained by solving simultaneously the 
normal equations 

1. 2 log Y — N log a + b 2 log A" ; 

n. 2(log X • log }") log a 2 log A' -\~ h 2(log A")'*. 

Substituting, in these erpjation.s, the values frenu 15ibl(' 20.4 (loga- 
rithms are in Appimdix R) gives 

I. 38 727389 - 20 log a + 28.7280 1 26; 

II. 56.019891 -= 28,728012 log a + 41.5811456. 

Simultaneous solution yields 

log a = —2 569125 and 
h 3.136656. 

The estimating equation may now be written 

(log F), - -2.569125 + 3.136650 log A^ 

Since the estimating equation which we are using is the linear form of 

F. « aX\ 
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the estimating equation, in terms of the original data, is 

Vc - 0,()026y7A’'3 

(Note that log a ~ —2.509125 = 7.430875 — 10, and its antilog is 

TABLE 20.4 


Computation of } ahies I stcfi for Drtermininsi Mca.nuros of Relationship 
Between Logarithm of Diameter and Logarithm of Volume of 
Twenty Ponderosa Pine Trees 




n.oKaritbro 

fl are 

Li) 'tamed 

FrorlTi 

Vj'l'cndix 

11.) 




Diameter 

Vol- 




1 


1 




at breast 
height 

(board 

feet 

log -Y 

log Y 

log X log }■ 

dogA'f’ 

(log YY 

(inches) 

X 

^ 10) 

)' 










36 

192 

1 556303 

2 

283301 

. 3 

553 50 S 

2 

422079 

5 

213403 


113 

1 117158 

2 

05307 S 

1 2 

971128 

2 

094266 

4 

2)5129 

28 

88 

1 -1471. *■>8 

1 

911483 

2 

8J3y74 

o 

094266 

3 

781014 

41 

294 

1 ()12781 

2 

46S347 

! 3 

98091 1 

2 

1)01072 

6 

092737 

19 

28 

1 278754 

1 

4471. 58 

i 1 

850059 

J 

635212 

o 

094200 

32 

123 

1 50.^1.50 

2 

089905 


14562 1 

2 

265477 

4 

307703 

22 

51 

1 . 342423 

1 

707570 

! 2 

292281 

■ 1 

802100 

2 

915795 

38 

252 

1 579784 

2 

40.40! 

1 3 

793095 

1 2 

495717 

5 

766727 

25 

50 

1 397910 

1 

748188 

i “ 

413S62 

i 1 

9;) 1230 

3 

0.50161 

17 

16 

1.2.30449 

1 

204120 


181008 

1 

514005 

1 

44990.5 

31 

141 

‘ 1 491.302 

2 

149219 

! 

205264 

! 2 

221101 

4 

01 9 M2 

20 

32 

I 301030 

1 

505150 

1 1 

958245 

1 1 

692679 

2 

265477 

25 ; 

86 

] 397940 

i 

934498 

2 

701312 

! 1 

954236 

3 

742283 

19 - 

21 

1 278754 

1 

322219 

i 1 

690793 

! 1 

635212 

! 1. 

, 748263 

39 j 

231 

1.591065 

i 2 

3t>3612 

i 3 

76(}6o0 

! 2 

531488 1 

5. 

, 586662 

33 ; 

187 

1,518514 1 

2 

271842 

i 3 

449^^24 

i 2 , 

,305885 ! 

5 

161260 

17 

22 

1 230449 : 

1. 

342423 

1 1 

651783 

1 1 

514005 1 

1, 

802100 

37 

205 

1 . 50)8202 ; 

2 

311754 

i 3 

ti‘^5297 

' 2 

1 

459258 ; 

5. 

344207 

23 

57 

1.. 30 172,8 : 

1 

755875 

' 2 

391024 

: 1 

S54303 1 

3 

083097 

39 

265 

1 , 501065 i 

2 

423246 

; 3 

8555 12 

i 2 

531488 ■' 

5 

872121 

’ 569 i 

‘ 2ri^“i 

28.728012 

38 

727389' 

\ 56 

619891 

! 41 

'581 145 “i 

”78. 

177518 


* Sec note to Table 'JO.l. 

For Bourcc of data, see Table 20.1. 


0.002697.) The estimating equafiem i.s shown on t^hart 20..'), which has 
logarithmic sciiles, and on C'hart 20.8, which has arithmetic scales. 

Total variation is** 

2(tog y)-- - i\log Y)- - log Y, 

•Note that Ihlog i/Y = Sllog }' - (log Y)\^ = ^ (^log Y , Itisnof 

S[log (?’ - P))*. Similarly, 2Uog i/)’ = SRlog Y)^ - dog V)]* and I'dog yY, = 
2(log Y - (log r).]». 
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VOLUME. BOARD 
FEET <0 



Chart 20.8. Diameler and Yoliinic* of Twt*ni\ l*oiiii(‘ro».u Tine Irt-es ami 
Kstimating Equation of Type (Lop Y r ~ lop o -f- h lop A uilh /om-s of i 1, 
'i;2, and ±3 Standar<! ErrorK of Kstimalt*, Sho^>n on an Vrilhniclit' (irid. 

Data of Table 20.4. Kstirnalirip cquiiiion shown by solid line. 


whfcrti log V 


2 log r 
X 


value for total variation in 


38.727889 

20 


1 .93r)3r)9 If), rhe numerieal 


2(log - 78.177518 - (1.93G3G915) (38.727389), 
- 3.180985. 

Explained variation is'* 


® If we were computing 2 (log y)l anc 2(log ?/)J from both {log T)^ log « + 
b log X and (log Y)e - log a -f- A log h, v^c would probably wish to distinguish, by 
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S(log y)l log o S log T 4- 5 S(log X ■ log Y) — (log 7)2 log 7, 

- (-2.5G9125) (38.727389) + (3.136656)(56.619891) 

- (1.9363694.5) (38.727389), 

= 3.111085. 

Unexplained variation may now be obtained by subtraction: 

2 (log y)* = 2 (log yV - 2 (log y)“, 

= 3,18698') - 3.111085 = 0.07.5900. 

The coefficients of deterinination and correlation are 

^log r.iijg A' 

^ lug y I'. 14 X 

We mav show a sifin for the correlation coefficient, )>ecause the relation- 
ship between lo^' Y and log .Y is linear. 

Since only two constants are involved in the estimating equation, we 
may compute the coefficient of correlation by using the modified produet- 
moment formula. It will be recalled that this expression allows us to 
obtain the correlation coefficient without first ascertaining the constants 
in the estimating equation. For log Y and log 


^ /a) ;: 

2 (log //)- 

+0.988. 


3.11 1085 

' 0.9^0 and 

3.18G985 


^log 7.log A' 

^ A2(log .Y • log 7) -_(2Jog Y)(2 log 7) 

V[v'2(log Y)- - (2’log Y)4[Y2{]og'r7 - ('2 log Yj^]’ 

201.56 619891) - (28.728012)'3S.727389) 

" V''[20(41..5S1145i - (28.728’012)''j[20t,78.l77.518) - (38 727389)=] 
= +0.988. 


The standard error of estimate is 


^iog r.iog 


^ N ^ 20 


The zones of il, 2, and 3 standard errors of estimate are shown on 
Charts 20.5 and 20.8. Note tha^, on Chart 20.8. ihe zones of scatter 
depart more and more from the estimating equation as the value of A" 
increases. On Chart 20.5, the zones are always equidistant because the 
scales are logarithmic. 


means of symbols or otherwise, between the two methods of obtaining explained 
variation and uiK'xplained variation. 
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It may be well to illustrate the computation of one Ye value and to 
show how the standard error of estimate is employed. To ascertain the 
value of (log y)c when X = 30 (for which log X = 1.477121), we wTite 

(log V)e = -2.569125 + (3.136656)(1.477121), 

= 2.064005. 

The antilog of this is 115.9, so that 1% = 115.9 tens of board feet. To 
obtain the limits of + one standard error of estimate, we write 

antilog [^log T)r J: i .i>g .«] " antilog (2.064095 ± 0.061604), 

antilog 2.002491 and 2.125099, 

“ 100.6 and 133.6 tens of board feet. 

For the limits of ± two standard errors of estimate, we compute 

antilog [(log Y)c ± y lo* x] = antilog (2.064095 ± 0.123208), 

^ 87.3 and 153.9 tens of board feet. 

For the limits of ± three standard errors of estimate: 

antilog [(log Y)c ± = antilog (2.064095 ± 0.184812), 

= 75.7 and 177.4 tens of boani feet. 

In a similar manner, limits may be oluained for estimates of v^olumc 
based upon ot})er values of X. It must be ronienibered, of course, that 
the (log }'), value and the .vi.gvio*v value must be combined before 
antilogs are looked up in the table. Alternatively, the standard error of 
estimate may be applied to the Yc values in the form of a ratio. For 
example, 

antilog - antilog 0.0()1601 = 1.1521 and 

antilog *— r.iu« T = antilog - 0.061604 - - antilog 9.938396 — 10, 

- 0.8678. 

Any Yc values computed from our estimating cquatioTi may now be multi- 
plied by these ratio.s to obtain the limits of ± one .standard error of 
estimate. For the ca.se where A'" = 30 and Vc -= 115.9, we get 

115.9 X 1.1524 - 133.6 and 
1 15.9 X 0,8678 = 100.6 tens of board feet, 

the same values that were obtained V>efore. For limits of ± two or three 
standard errors of estimate, the procedure is the same, except that the 
initial step involves multiplying x by 2 or 3, or the ratios just 

obtained may bo squan^d and cubed. 
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The V^K, X relationship. Because the scatter diagram of Chart 
20.6 appears to be more uearly linear than docs that of Chart 20.5, we 
should expect to obtain a higher coeflicient of determination or correlation 
for the V" K, X relationship than for the log Y, log X relationship. How- 
ever, the coefficients which we are about to compute cannot be much 

TVBf.i: 20.5 

(Computation of Values Vseil for Determinim* \ieasures of lielat ions hip 
Jietween Dianu*ter ami Stjnare Hoot of f olunie of Twenty Pomierosa 

Pine 'Trees 



(.StjUH'O nu\i\ I'.k\ 

l>o (.htaini‘1 

1 ffoiii ^ 0 i 


J)iain<'t('r 
JU lirtusl 
liciglit 
(iriflu-s') 

\ onilDO* 

lOi 


A v'V 

A' 2 

y 





'Mi 


13. S'v) 

4!i8 00 

1 , 200 

28 

113 

10 0.; 

. 2t»7 1)1 i 

784 

28 

; 

\) 3S 

202 04 , 

784 

n 

1 ‘jn i 

;7 15 

703 15 

1 ,()81 

in 

; 2s • 

.5 20 

100 .'1 

30 i. 

:^2 

i 123 

11 00 

.154 88 

1 ,024 

22 

5 1 

7 M 

1 57 08 

481 

;i8 

2.VJ 

15 87 

i’.03 00 

i .-114 

2.') 

i 

: 4H 

1S7 00 

(’.25 

17 

to ! 

1 OO 

i)S (}() , 

280 

31 

1 U1 . 

1 1 87 

307 or : 

(♦0) 

20 

; 32 ! 

5 0(3 

113 20 

400 

‘>:> 

80 

0 27 

: 231 7.5 

025 

10 

21 

1 . 58 

87 02 

3tri 

Mi 

231 

15.20 

502 80 

1 .521 

33 

, 187 : 

13 07 

151 .11 

1 . oso 

17 

j 22 ■ 

4 00 

70 73 

280 

37 

i -05 ; 

14 32 

520 S4 , 

1 /MM 

23 

1 .57 

7 55 

1 7.3 05 

520 

30 

; 205 

10 2S 

; ()3I 02 

1 ,521 

* MM 

; 2 UU) 

20 1 08 

' 0,101 01 

IV .437 


* See note to '.'O I 

I’or Nourre of data, I’lil ir JO i 


higher than thos(^ just obtauied, since \v(» found “ 0.076 and 

r luK A‘ - d 0.988. 

The estimating equation is of tlu' type 

( V - a + ^A' 
and the normal equations are 

I. S Vr - Na 4 - b'SX; 

II. SA’ Vr = aSA -f- 

Substituting v.'iluos from Table 20.5 (aijuaros and .square roots are given 
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in Appendix Q), we have 

I. 204.98 = 20o + 5696; and 
II. 6,494.91 = 569o + 17,4376; 
which, when solved simultaneously, give 

o = —4.8587836 and 
6 = 0.5310293. 


The estimating equation, then, is 

= -4.86 + 0.53 IX, 

which is shown on Chart 20.6, w here V values and A' values are plotted, 

and on Chart 20.9, on which the Y and X values appear. 

Total variation is computed 

= s (v'y> - Vy xVy xy - Vy-^ Vy, 


where V^i’ 


y:VY 204.98 , . ,. . 

= - ^ 10.249. Total variation is 

N 20 


S(V^)2 = 2,460 - (10.249) (204. 98) - 359.1600. 

E]xplained variation is 

2 (v"^)- = a2 V^y f 62AA/y - ^ 1’ 2 Vi', 

= (-4.8587836)(204.98) + (0..53 10293) (6,494.91) 

- (10.249)(2(M.98), 

= 3.52.1940. 

Unexplained variation is 

2(^2/); = 2(Vyy - 2(\^)f, 

- 350.1600 - 352.1940 - 0.9660. 


The coefficient of iletermiriation is obtained from 


\'Y.X 




352.1940 

359.1600 


= 0.981. 


“ Note that 2:(\/y)* “ “(\/ Y - s/YY =■ S ■ !*• *« 

2(Vy - P)». Similaily, 2(Vv)’ = A(.Vhc - V'?!’'*"’! “ '-[V? 

- (\/yiJ*. 
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This value is slightly larger than that obtained from use of the second- 
degree equation = 0.978) and also larger than when the logarith- 
mic estimating equation = 0.97G) was employed. The coeffi- 


VOLUME. BOARD 
FEET T to 



Chart 20.9. Diameter and Volume of Twenty Ponderosa Pine Trees and 
Estimating Kcpiation of T>pe (\/Y)r - a -f' h.Y, with Zones of ±1-* ±2, and 
i;3 Standard Errors of Estimate, Shown on an Arithinetie (irid. Data of 
Table 20.5. JOstimating equation shown by solid lino. 

dent of correlation is the square root of the coefficient of determination, 

Vf..r = 

or, if a and b have not been computed, it may be a.scertained from 
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V” [.V2:.V= - (2X)-|(A*2 V - (2 \/ IT] 
20(().I!>4.0ri - (;')(;<) 1(204 OS) 
VT20(17,137) - (r)()0)’]i-0U’.i<>(>! - Y-WOSj'^j 
■= 4 - 0.000 

The standard error of estimate is 


\'Y.X “ \ 


i 




^ 20 


The zone." of ± I, 2. and d rtandaixi ei rors of estimate appear on (Tarts 
20,0 and 20, t). .\.s in the ea^ of tlie io^arithmu' lelat ionship, the zone's 

bceome widme in absolute terms, as A inen'asi's 'I’liis may l>e si'eii i)i 
(Tart 20.0. t)n ('hart 20 0 the zones are t'( juidistanl because 
valties were [ilotbal. 

When A' -- .20. the value of is obtained M" folhoNs. 

I V ] i,: - -- 1 .St) ^ (1) r».*-} 1 '{dO'i - i 1 O’* . 

Since f \' bp 11 thf. ( I 1 07b -- 122 d of boanhect. d'o ji;et 

the limits of 2: uaa' ■•'(andard I'rror ol e^Tiinate \\r {’omputi* 

\(VY]r, 'S\ y ^]' 07 ^ 0 7>nb 100 Sand 1 0 len.s of board feiO . 

Tlie limits of x .^l.indard erroi.'' of e.siimate are cojn])Uted from 

((vS'l, i 2.-^ .. [II 07 i- ■2i0..-.!)ji-’ 

*- 07 S and lot).! Uais of board feet 

For the limit. *5 of three standard errors ot ('.slimate, 

[(V)’),. ± J, J- [11.07 ± 3(0:)<);r' 

--- 80, r> and Ki I !> tens of board feet. 

In a ."similar manner, liimis may be eomjnited lor other estimate.^ of 
volume. It. is important to remember that the (\/}')c and the y 
vahies must be, eombined bidore tin? .sipiares ani obtained. 

(lonijiarihou of ihe three iion-linear relationships for iliuiiieter 
and volume of trees. Although it is clear that tiny one of the three 
non-linear e^timaring eijuations is preferable to the linear equation for 
de.scrihing the, foi relation hetwemi the diameti'r and volume of pondero>a 
pine trees, it is not at all obvious wliich one of the tlin'e non-linear equa- 
tions is siipiu'ior, .sinr* all of them give eoefhcients of dfiterminatiori which 
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differ only in the third decimal place. All round to 0.98. It Is rather 
unusual to find that scx'oral c(|uation types give coefficients so nearly 
alike that there is little rt)oni for choice between them. JIow(*vcr, it must 
1)0 remembered that, in one soise, the coefficients are not strictly com- 
para])le. The second-degree cnrv(‘ explained 97. S per cent (/j = 

0 978) of th(^ variation ij] the }' values. Tlie logarithmic estimating 
equation exj)lained 97. ff per cent ~ 0 !)7d) of the variation in 

tin', logarithm.^ of thr‘ )' values. Thf^ (‘sliniating e(juation using Y aru! 
A' explaiiH^d 9X. 1 j)rr rf‘nt (/‘^ ~ nOSl) of \'ariation in the sqvarv 
ree/,s‘ of the Y values 

The lfir(‘e standard (UTors (\>,tlniab' cannot he compart'd with each 
other, since they aoi in dil’icrcnt iniils For the st'eond-di'gn'o curve, the 
standard error of o.f iniate is always 13.1: board fcf‘t 10. When the 
logarithmic ('stimating <‘<|nation is the .•'fatKiard error of estimate is 
alway.s 15.2 per rent oi tin' ('stimaic' ni a i)osi(i\T direction or 13.2 ])('r 
cent ot ttio istimate in a n(‘g;itiv«- direction. As pointed out in (diaptcr 
10, the standard (‘rror of cstiniat* ]'< an o\cr-a]l m(‘asurc of the disjicrsion 
of actual values from I'slimauMl values, whicli is nevertheie>,s ap})lii‘cl to 
specifii' estimate's. 'Taldt' 20. ^hows estimates of volume oi Ponderosa 
pine tree.s madi* by (‘a< h of the ihicc non-linear metliods and the 
amount of error K'jircMuited by f>tu’ standard erroi oi o^limate in (*ach 
directioti, wlum A ---- IS. 30, arid M). Estimates made by the second- 
degree curva* and by rlie A' P. .V J’t'lationship are not. much different; all 
three eijualions give about the same cstimau' of volume when A' ” 18. 
In absolute tmans, the ('rror' Y '‘onstaut w'h('th(‘r V is large or small, wIh'ji 
the sX'cond-degK'i' c(|muion is us<*d . for (dtlicr of in- other two erjuation 
types, the I'rror Ix'cona's grt'at('i as A^ inrn'as-es For small valiie.'s of X\ 
tlie logarithmic rclationslu[) shows tfu' sina}!« st errors; whik' for large 
values of X. th(‘ s(*c(>iul-(i('grc4* cui\'c shows tire smallest I'rrors, Tlu* 
\/y*, A relationship is gemu’ally inTerm(*dial(' Ix'tween thest* two 

One criterion that lias l)e(‘n suggested for comparing the suitability of 
different equation types consist.s of computing .a Yc value for eacli 

l^lY - YrY 

observt'd value of X and calculating y- ^ for 

the second-degrei' ecjuation, and. "-iuce the least-squares fit minimized 
2(T — Yr)^, the value of sy y\' = i3.2 would be expected to be smallest. 
It is somewhat surprising that the F, A^ relationship, which involved 
a least'-s(iuar('s fit to the Y values, also gives 13.2 as the standard 
deviation of the Y values around the F« value.s. For the logarithmic 
relationship, which involved a least-scpiares fib to the log Y values, the 
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standard deviation of the Y values around the values is 14.9. In each 
instance, the unit is tens of board feet. 

Another criterion consists of undertaking to ascertain the estimating 
equation around which the T values are most nivxrly normally distributed. 
Since N is only 20, this hardly seems appropriate for this example. 

TABLE 20.6 

Estimates of Vobinie of Ponderosa Pine Trees and Zones of 1 One Stands 
ard Error of Estimate for Three Equation Types IThen X — IS^ 30, 

and 40 Inches 

{Thii vrtluofl in tlu' bod.v of tho tabb» or»» boar/i foot 10.) 


1 

X - 

J8 inches 

.V 

“ 30 inchi'S 

A’' 

40 inches 

EMimating ^ 

XcRa- i 


Posi* 

Xega- 


Posi- 

Negp- 


Posi- 

equation j 

tive j 

Fc 

tive 

tive 


livo 

tive 


tive 

1 

error ! 


error 

error 


error 

error 


error 

Second-degree .! 


22 5 

13 2 

iH 2 

122 1 

13.2 

"i:T2' 

268.1' 

13.2 

Logarithmic . ; 

3 0 ' 

23 3 

3 fi I 

1 15 3 

! 115 9 

; 17 7 

37 « 

1 285.8 

43 . 5 

J 

5,2 ' 

22 \ 

• 5 0 

i 12 7 

i 122 5 

1 13 5 ‘ 

19 0 ! 

! 268.3 

19 7 


As indicated ot the outset, there is little basis for choice among the 
three non-hnear equation typf‘s. Perhaps the information presented in 
the preceding paragraph.'^, tog(*thcr with the logical impli(*ation of the 
a/ F, A" relationship, mentioned on page 504, may cause one to be inclined 
to choose it. When several proce<lures arc of about equal merit, it is not 
inappropriate to choose the simplest one* or iho one which is easiest to 
compute. C)n this basis, too, we might select the X relationship. 

The log 1", X relationship. When correlating logarithms of V 
values with .V vaiiics, the estimating equation is of the type 

(log ]')r ~ log (I + A" log b. 

The normal equations arc 

I, 2 log Y - log a + log h XX] 

II. 2(A" ' log F) ~ log a XX + log b XX^. 

Total variation 

2(log ]/Y - 2(iog Vy - fh^F)S log F; 
explained variation 

2 (log y)l ~ log a X log F +- log b S(.Y log F) — (log F)S log F; and 

See note 8. 

See note 9. 
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unexplained variation is 

ZdofT y): = Z(log yV - Z(log y)l. 

The coefficient of determination may be obtained from 

The coefficient of (‘orrelation is, of course, the .square root of the coefficient 
of determination, [f io^ a and log b are not needed, rio^ y,x may be com- 
puted from 

A' 2:fA' • log Y) - iXX){X log Y) 

VlA'ZA'^ -^“fZ.V)d[.VZ(log (Z loglT] 

The standard error of estimab^ is 


t' X 



The - > X relatloushij). For this ^relationship, the estimating equa- 


tion is of the type 

The normal equations arc 
I. 

J1 

Total variation is^^ 


( - « + a'. 


Z ~ = A'n -\-b2X\ 






«Note that 2 Q’ = Z [^,1 - ( j ■ It is not 2[l 

(Y-?)]K Similarly. 2 (i); = 2[(^.)^ - Q)] and X = ^ [| - 


1 


2 
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Explained variation is 



and unoxplaiiHxl variation is 



The eoeffH'ienf of determination may he coinpuled from 



and the S(juaro root is . Altcrnatuady, the t‘orreh\tion eix'liic'ient 
may be had from 



^\hieh does iiot ^*ali for the values of a and h. Th(^ standard (UTor of 
estimate is 



TIIF rOHKKLATlON H ATK), yj 

When data are arran<j;ed in a (‘orrelation table as in Table 20.7 and 
when a non-linear ndat’onsidp is prestmt, it is sometimes of interest to 
know the value of thf‘ rt)rreTition coefReient which wo\ild result if the 
arithmetic means of th<! columns were us(»d instead of an estimatirif^ 
equation, ('hart 20.10 shows, l>y the use of horizontal lines, the eolumn 
means of doable 20.7. It also shows, for purj)oses of e()mparison, a second- 
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degree curve fitted to the ilata. 'Phe measure of correlation, based upon 
Ihe means of the columns, is Tir.x, the correlation ratio. It is similar to 
the correlation cooffioients that we hrive already discusspfl in that it is the 
square root of the proportion of (he total variation in the Y series that 


MAKI HOURS 
PER TOW 



C'hart 20.10. ^ i€*Id prr anJ Man-Hours ju'r Ton Hetjuirrd 

lo Jlarvosl Broom ('oni in Kast-Ontral Illinois. Horizontal linos 
indicate averajit" man-liours p( r 1»‘n for e.at h yicid, wiiih^ l■n^^c repre- 
sents computed \ t'unii <‘quation )\ ■- iTJodVot — 0 .5ri58 t20.V 

-fOd)0()d275i)IOA ^ Tlu.s etjuation was compute'’ on pp. 721-725 of tlie 
first edition of Hiis te\t. Data I’lom source ^iveji below 'ruble 20.7. 


has boon cxplainf^l by tbo vanalk'n of iho eoluinii moans. That is 

'variation oxplaim'd by column mearus 
total variation of the Y series 

There is also a eorreljitiou ratio, yjx ri ^vhieli is the square root of the proportion 
of the total variation in the A' series that lias been explained by the variation of the 
row means. 


Vr.x - y! 
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or, in symbols/® 



where F<. is the arithmetic mean of a column, 

Nc is the number of items in a column, 

iV, 

2 indicates a summation over the Nc items in a column, and 
1 
k 

2 indicates a summation over the k columns. 

1 

Since the data of a correlation table are in terms of class intervals, this 
expression must be rewritten avS for a fre(|ucncy distri))Ution or as for a 
correlation coefficient computed from a correlation table. The expression 
becomes 




indicating that 68 per cent of the variation in man hours (the Y variable) 
has been explained by the use of the column means. The correlation 


Proof of the equality of the first and last of the three expressions follows that 
shown in Appendix S, section 26.1, 
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ratio is the square root of this value, so 

Vi.x = 0 . 825 . 

The correlation ratio has no si^n. since the relationship is not necessarily 
po»sitive, or negative, for all values of the two sorites with which one may 
be dealing. Furthermore, tlie horizontal axis may repn\scnt qualitative 
categories rather than numerical values. 

The correlation ratio is of interest primarily because of its relationship 
to the curvilinear correlation coetficient. The correlation ratio is always 
equal to or larger than the correlation (‘oetficient obtained by use of a 
curve fitted to the grouped data, provided the numlx-r of const anfs in tlio 
equation is eiiual to or smaih'r than the number of columns used in 
computing 7)y,x> Roth 7]y x and the curvilinear correlation (’fiefiicient 
become larger as the number of columns or the number of constants in 
the equation is increased. 

There are several limitations to the usefulness of the correlarion ratio. 
First, the data must he groufied - not neeicssarily on boih axes, but tlie 
independent variable must he grouped. Second, if nuinbi'r of groups 
for the independent variabh^ is increased, the value of the (xirrelation 
ratio increases, becoming 1.0 if the groups b(M.*oinv^ so niiinerous t hat there 
is only one observation in ea(*h group, d'hird, IIku'c is no (‘stirnating 
equation, and therefore no satisfacUirv way of making (Estimates the 
dependent variable. 



Symbols Used in Chapter 21 


For the symbols usfd in the first para^^raph of this cliapter, see the list 
aeefmipanying Chapbn’ 19. 

O1.2: value of A\.i 2 .Vo 0 in the estimating t‘qiiati<u] A" ,,1.2 = ai.2 + 

hyzX'y. Same* as <1 in the esiiriia.ting e<iuation ) c ~ + ^>-V used in 

Chapter 19 . 

fli.3: value of A"m..j vhen Xvl ™ 0 in the estimating equation A'^.^ = ai.3 + 
(i\ 23^ value of A’,„i wl)en A'o --- 0 and X?, ~ d in the estimating equation 

A cl. 23 =■ Ui 23 'I' f>ll.'jV2 /;]3'»A.3. 

0. .... ' -due of Adi VI vhen A^. = 0 and Ad = 0 in the estimating equation 

AcI'M Ui.VJ d" ^02 lA 2 ‘jA 4. 

di.u' valut‘ of A'h n \vhe!5 Adt 0 and A'l "■ 0 in the estimating luiuation 

A cl ;m “ Ui.:m d~ ^'1.1 lAr; fui.iAi. 

t;. : value of A<i-:.u m wlnm Ad, Ad, A".t. • * • . Ad,, equal zero in 
the estimating ef}Utuion 1 •..'>4 . Oi 2.14 n, + ^>12 -u . . . Ad + 

^03.24 ir^J + wA 4 + * ’4“ 23. • (m nA,„. 

ni.?2'3' valiH* of Ad-i vv'.{ NNhen A'-v, X]^ and Adi <*<iuai zero in the estimating 
e([uatiott A,i 22', 5 d~ ‘^qv'i'sAo 4“ 4~ is'A u. 

by:', eoeiiieient of Ad in tlie estimating equation Adi.^ = Ui.- + bi-Xo. 
Same as h in Chapter 19. 

613: \*oefiieient of Ad^ m the estimating eijuatio 1 Adi. 3 ■= Ui 3 4- biinX-A. 
&12.3: eoeffieient of Ad in the (estimating e(iuation Adi. 2,3 -■ ni 23 4- by^ -A'-. 
4“ ^ 13 . 2 A 3. 

613.2 •' <’oeffl('ient of Ad in the estimating e({uation Adi.-:3 = «i.23 4“ 612.3X0 

4" 61.3 oA 3, 

612.4, 614 2- coefficients, respectively, of Ad and in the estimating equa- 
tion shovn above for Ui o|. 

61.3.4, 614.3: coefficients, r(\sp(‘<*tivelv, of A'3 and A’^4 in the estimating equa- 
tion shown above for Qj 34. 

612.34- co(»fficient of Ad in the estimating equation Adi. 234 ~ Ui 234 + 

612.34X^2 + 6 1.3 . 24 A 3 'h 61 4 . 23 A 1. 

613.24: coefficient of A'^ in the estimating equation A''ci.234 = ai.234 + 

612.34X^2 4" 613. 2iA;: 4" 614.03A 4. 

614.23: coefficient of A^4 in the estimating equation Adi. 234 = <21.234 + 

612.34-Y2 + 613 .24 A 3 4~ 614 .23 A 4. 
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hi2,Ur.mt 61S.34 hiA.ii-mt * ' ’ » 6 i «.23 - - . (m-1) ^ coefEcients, respec- 
tively, of -Ys, A'a, A^4, ■ ‘ ‘ , Xm in the estimating equation given above 

for ai.234- . . rn^ 

6i2.2'3, bi2\ny ?>i3.22': cooflicicnts, respectively, of X2, Xj, and X3 in tlie 
estimating equation given above for ai.22'3. 

631: coefficient of X\ in the estimating etjuation X^2 1 = a^.i + b2\Xi. 

Used in this chapter only to assist in the computation of ^12.34. 
iSi2.j4, i8i3.24, / 3 i 4 . 23 : lowcr-casc Greek beta; beta coefficients which represent 
one way of measuring the individual importance of, respectively, the 
variables A"2, A 4, and X4. / 3 i,„. 23 ...(m - d is the generalized form for 

measuring the importance of A'm. 

^2.34j di3.24, ^14.23- coctficionts of separate determiaation. One way of 
measuring the indmdual importance of, respectively, A's, A'a, and X4. 
d\m 23- its the generalized form for mcasuriiig the importance of 
X,„.‘ 

N\ the number of items in a sample. In multiple or partial correlation, 
N is the number of sets of observations. 
r\2\ coefficient of determination for Xi and X2. 
r\^\ coefficient of determination for A"i and A’' 3. 
rJ4: coefficient of determination for Ai aiid A'^. 
r23: coefficient of determination for X2 and A" 3. 
r24: coefficient of determination for A"« ajid A’'^. 
r\^\ coefficient of determination for A".} niul .Y4. 

coefficient of partial determination; the adrlifionaJ variation in Xi 
explained by X2, expressed as a proportion of tlio variation in Ai which 
was unexplained by X.3. 

fia 2* coefficient of partial determitiation ; the additional variation in 
explained by A"3, expressed as a proportion of th(i varintlon v* AA wh:t 1. 
was unexplained }>y A 2. 

7*12 4, ri3.4, 7 * 14 . 2 , 7 * 113 , ^ 3, coefficieiits of pnrtiai Wrvii in 

this chapter to a.'>sist in computing various o' her up’ ' .u* , 

7’i 2,-}4‘ coefficient of partial deterniinntmn /-e, at v.^riation in Ai 

explained by A"2, expressed as a proportc-’o \)i <J*e \ crndn ri in which 
was unexplaificd by A-; and Ari, 

7 * 13 . 24 * coefficient of par^ji d^^u rniuiai,n,.n . '‘.'nir addd/n^nal variation in Xi 
explaiufvi V;v X , - s ' d ... ‘i .o?! of tiif' saiiatiun in Xi whicli 

was ■>* I ■\ ,’-p’ .. 

7 * 14 . 23 ' o; oanrd vU'tc‘rmmation; Uie additional variation in Xi, 

explained Ay AA- expn\''Scd as a proportion of the variation in A’^i 
which was unexpiainrd l:>y X2 and 

7*12,34 - w* a general form of the coefficient of partial determination; the 
additional variation in A^^i explauicd by X2, e^xpressed as a proportion 
of the variation in AA wliieli war unexplained by X3, A"4, ’ * * , Xm. 
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^ geiioral form of the coefficient of partial determination; 
the additional variation in X\ explained by Am, expressed as a propor- 
tion of the variation in Ai which was unexplained by X2, X-s, • * * , 

A(m-l). 

^im.23 . . . (m-2}, ri(m-n.23.. rm(m^i).2.i . . . (m-2) ^ general forms of 

coefficients of partial correlation used in this chapter to compute 
riw.23. . . (m 1). Note that the three coefficients are one order below the 
one being computed; the first excludes the second excludes Xm, 

and the third excludes Xi. 

^1(34). 2- multiple-partial coefficient of determinatioii ; the additional 
variation in A'l explained by A3 and A^4, expressed as a proportion of the 
variation in A^i which was unexplained by 

coeffhuent of multiple detenninatimi; the proportion of variation in 
X\ which was explained by A? and A^. 

Ri.2i‘ coefficient of multiple determination; th . proportion of variation in 
Xi which was explained by A'*; and A^4. 

R? ''^^fficieut of multiple determination; the proportion of variation 
in Xi which was explained by A3 and A4. 

«L 34 : coefficient of multiple determination, the proportion of variation 
in Ai which was explained by A3, and A%. 

^^1.234 • m* general form of the eoefficient of multiple determination; the 
proportion of variation in Xi w'hich was explained by A's^ A4, 

f X PI. 

Ri.2S4’-’(m~ii‘ ^ general form of the eoefficient of multiple determination 
used to assist in the computation of ,^3. . 1,; the proportion of 

variation in A'^i which was explained by A"2, A'a, A^4, ’ ’ ' ; A^(m~i). 
Ri.j4-- m- ^ general form of the cot fficient of nudtiple determination used 
to assist in the computation of ^{2.34...^^ f)roportioii of variation in 
Ai which was explained by .Y3, A^4, • • • , A'^m. 

53, SAf ' • ’ : respectively, the standard deviations of the Ai, X^f 
A3, A'4, • • • series. 

S1.2: standard error of estimate for the estimating equation A%i.2 — tti.2 + 
buX2^ Same as sy.x in Chapter 19 . 

51.3^ standard error of estimate for the estimate. g e(|uation Ad. 3 = cli.z + 
& 13 A 3 . 

51.23: standard error of estimate for the estimating equation Aci.28 = ai.23 
+ b{ 2 .zX 2 + biz, 2 Xz. 

51.24: standard error of estimaU for the estimating equation Ad. 24 = 

^Il.24 + bi2AX2 + tli4.2A4. 

S1.34: standard error of estimate for the estimating equation Ad. 34 = 

^ll ,34 + biz^Xz + hi 4 . 3 A 4 . 

51.234: standard error of estimate for the estimating equation Aci.234 = 

<ll .234 4 “ bi-: 34A2 4 “ 613.24A3 4 " hl 4 . 23 A'^ 4 . 
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a general form of the standard error of estimate for the esti- 
mating equation given above for ai.234 ...m- 

Sm.nz a general form of the standard error of estimate used to 

assist in oomputing 23. 

w*. upper-case Greek sigma, meaning ‘'take the sum of.’^ 

Sx’f: total variation of the A'l values. 

4: variation of -Vi explained, lespectively, by A" 2, by 

A3, and by A 4. 

wJ''Y 23) ^^“1.34' variation of A'l explaituHl, respectively, by A% and 

by A"2 and A' 4, and by and A^4. 

variation of Ai explained by A^, A'3, and A' 4. 

^^M.234 ■ - m: a g(nieral form for explained variation; ilie variation of A] 
explained by A'2, A j. A^, • • . A"„,. 

S.'cfi .ju- -1 . 34- ...• general forms for ex])laincd variation; the 

v^arialion of A'l explained, respect ively, by A*>, X \ , X^j^ d 
and by A’3, A4, * • ■ . A,,*. Used to assist in coinpiiling ,, 

and T y2 ,u 

SjtJi oj .ir 4* variation of AA unexplaincil, n'^p(n'tiv<dy, !)y by 
A" 5, and by A 4. 

Sx,"i Op ]S.r;j j4, ,54: variation of A"- unexplained, resptM'tively, by A? 

and A';}, by A"^ and A"4, and by Xz and A4. 

2x^Y*.’ 34' v'ariati<)n of A^ unexplaineil by A"?, A'^, ami A'^. 

2x.^i .>34- • • w- a geiKTal form for nnexplauied variation, the variation of A^ 
unexplained by A'2, A^a, A4, • * • , A^. 

, 2jj;i 14/ g(meral forms for unexplained variation ; tlie 
v^ariatiou of A'l unexplained, respectivTly, by A%, A\i, Ai, • ' • , A^„, d 
and by A3, A%, • * * , Used to assist in i‘,omputing r‘;„, .^^5. 

and 

Xi, Xo, X3, Xa. • • * , x,n: values in the A"i, A 2, A\,. A'4, • • • , A^, scries 
expressed as deviations from th(;ir respective anthiinqii: imams. 

Xci: see 2x^i with various ailditionai subs«Tipt.s. 

x^i: see ^Xgx with various additional subscripts 

A"i: the A" i series, also an oi>serv(ai value in the .Yi scries. I’lius, we refer 
to correlating A'l with A2. A^s, and A4, bu.; 2\Vi means ‘*take the sum 
of the values in the AA series.’^ 

Ay, A^3, A-i, ' • • , A' respectively, the Ay, A^s, A^, * • • , A„i series; also 
ohstn'vod values in those series. See A’^i. 

A"i, As, A3, X4, * • • , Xn,- respectively, the arithmetie means of the 
A), Ay, A3, A4, • ’ * , A ,,, stiries. 

A^j.y: a computed value of the Ai series wdien the estimating equation 
A"ci, 2 = «i.2 + hioXz is used. Same as Yc hi Chapter 19 . 
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Xci.s: a computed value of the Xi series when the estimating equation 
Xci.i ~ fti .3 bisXz is Uwsed, 

Xci.n- a C'omputed value of the A'^i series when the estimating equation 
Arci .23 ni .23 "f* + hi.i.oA 3 is used. 

^"^1,24.' a computed value of the A"i scries when the estimating equation 
shown above for Ui 04 is used 

A%i. 34: a computed value <>{ the Xx seric^s wduMi the estimating equation 
shown above for ni.34 is used. 

Xci.234’ a computed v'aUie of the Xi s(U‘ics wlien the estimating e(iuation 
Ac 1.2,M h t>i2 3iA2 4‘ ^*13 24^3 4* M is llScd. 

Ari.2:j4 ...m- n. eonu)Uted value, of tiu‘ .V-, stu'ies wlaui IIk^ estimating equa- 
tion shown above for U; is used. 

Ari,22'3: n c.ornputed valin.* of rhe X> series when the ('.stimating equation 
shown above for 01,22 5 used. 
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Correlation III: Multiple and Partial 
Correlation 


PRELIMIN VK^ EXIM.W V'l ION 

Simple correlation. Ik'forc plurji^in^ into iIk^ of nmltiplii 

aiKl partial correlation, it ^^ill ho iis(‘hil uy rovit'w hriofiy Iho oloTiiontary 
principles of t\s o- variable linear correlation, since the more refined 
measures involve? simply an extenhsion of the procedures already discussed. 
First, an estimating efpiation of tin' type 

}\ - r/ 4- 

was com{>ute(l by the iindhod of lea>t s({uares. 'Diis permitted us to 
make estiniat(*s of the value of the depeiidfud vniriabh' from \idu(.‘s of the 
indepemient variafde. Xext, il was dcrnonsl ratinl that the total varia- 
tion of ttje* dependent varia[)le was tin* sum of: (1 } tin* explain(‘d v ariation 
and (2j Mie \'ariatioii which \Ne had fail(‘d to e\j)lain by our hypothesis; 
that is, that 

it should b(‘ rc-membered that we eomput(*d 2^^/' by liie formula 

I'r ~ y^y] 

and that was computed from tin* exjjn‘ssion 

Xy; XY; ~ r2:r, 

in which 

27; - a2y -i-bXXY 


or, more simply, 


2//: = bXxij. 
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The standard error of estimate sy-x. 


which is 



enabled us to ju(lji;o 


the range of error of our estimates of the dependent \'arial)le. 2;/; was 
obtained by subtracting the explained variation from the total x ariation; 
that is, 

2.v; -■= - 2.yJ. 


Finally, a measure was computed that permit led ns to stale the propor- 
tion of total variation which had been explained by variations in the com- 
puted values of the dependent varia}>le. d'his ratio, 


r2 





was known as the covfilci( nt of dcU rminalioti, ami its square root was 
calk’d the ax'jflcifnt of correlation. 

Multiple eorrelalion. h^xaclly lh(* same’ principle's are involved in 
mult’p'c .rrelation as in simple correlation, but the procedure is more 
laborious, since there is more than one ind< panulent variai>le Also, it is 
iKJcessary to use sliglitly different symbols, d'he illustration in thift 
(diapter will dr‘al with th(’ rc'lationship Ixn.wemi suicide i.ttc's f)y regions, 
and average age, pi’r cent male, and biisine>s-failure rate in tlioso sanu' 
n’gions. Suicide rat(‘ is the depiMident \ariable, and the other three are 
iiidep)endent \ aria]»les 

To simplify computalions si) that they can be shown in full in this 
(‘hapter, the United Stati’s has betai divided into lb regions of substan- 
tially equal ])opulalion and more or less homogiaieiais eliMi'acteiistics. 
With the exception of \cw York St;ae, wliich lui' ’'eeii divided into Xew 
York City and upstate Xew York, the boundaries .>f these regions follow 
state boundaries. The eomposition of the different j-egioiis can be 
observed by referenee to 'fable 21.1. Seieetion of homogeni'ous areas of 
eijual population serves to make tlie statistical results more meaiiingfiii 
in that each region is given proper weight in thi’. calculations. On tlie 
other hand, use of only 19 observations with an eiiuation of t constants 
does make the degrees of fri'cdom (see the section in Chapter 2(> dealing 
with the signiticance of multiple-correlation coefliiaents) rather small. 
The results obtained must therefore be regarded as primarily of illustra- 
tive importance. 

It simplifies the notations somewhat if, insteail of using ditterent letters, 
each of the variables is designated by the letter X, dilTennitiating between 
the variables by means of subscripts, 'fhis is particularly true if the 
number of variables is large. We shall therefore designate our variables 
in this manner; 
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Dependent Variable: 

Suicide rate Xi 

Independent Variables: 

Average age X% 

Per cent male Xi 

Business-failure rate Xi 


It is interesting to note that, of our three indepeudiMtt variables, two 
relate to charaeteristies of the population while only one, biisimiss-failure 
rate, can be thought of as a possible cause. Wliatever tlie causes of 

TABLK 21.1 

\ineteen Relatively noTnogerietms Regions in the f nitetl States of 
Approximately h'*jt»al Population in 1050 


Region 

nuwber 

in 

milUof 

States included 


1 

0.5 

Maine, Ne^\ Hampshire, Vern'..>nt. Ma-^.saelui.setts 


2 

7 0 

Rhode Island, Tonnecticut, New Jersey 



7.0 

Now Vj^rk City 


4 

G 0 

New York, excluding Ne\N York City 


0 

. 10 5 

Pennsylvania 


0 

7 9 

Ohio 


7 

10 3 

Indiana, Michigan 


8 

8 7 

Illinois 


9 

0 1 

Wisconsin, Minnesota 


10 

0.0 

Iowa, Missouri 


11 

5.8 

North Dakota, South Dakota, Nebraska, Kansas, Colorado 

12 

10 8 

Delaware, Maryland, Di.striet of C’olumbia, Virginia, 
Carolina 

North 

13 

8 3 

South Carolina, Georgia, Florida 


14 

8 2 

West Virginia, Kentucky, Tennes.see 


15 

7 9 

Alabama, Mississippi, Louisiana 


lf> 

5 0 

Arizona, New .Mexico, Arkan-sas, Oklahoma 


17 

G 2 

Montana, Idaho, Wyoming, Washington, Oregon, 
Nevada 

Utah, 

18 

10. G 

California 


19 

7 7 

Texas 



suicides, it is reasonable to conjecture that tliey do not affect each age 
and sex with equal intensity. 

In the pages that follow, we shall start with variables 1, 2, and 3, and, 
after explaining the basic concepts and computations, variable 4 will be 
introduced. General formulas for m variables will then be given. 

The first step in the correlation procedure is to obtain an equation 
which includes both of the independent variables as a means of esti- 
mating a suicide rate for any region. The estimate is labeled Xoi.aa, 
since it is an estimate of varieble Xi computed from variables Xj and 
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X3. Since there arc two independent variables, there will be two h's. 
The equation type will be 


Xcl.iZ “ ^1.23 ■}■ "b ^>13.2^3. 


A word concerning the meaning of the //s and their subscripts is iieces 
sary. These net coefficients of estimation indicate the effr'ct. on A'l of a 
change in the accompanying independent Vctriable wheii allowain e hn"; 
been made^ for the other independent variable. Thus, 612.3 an estimate 
of the variation in suicide rate assoeiated with a variation in averag(‘. age, 
independent of variation in per cent male. The social scientist is accus- 
tomed to saying ‘'other things being The other thing which is 

held equal in this instance is the proportion of males in the dilTercnt 
regions. As between regions that have the same percentage of maU's l>nt 
differ with respect to age, (^ach variation of one year in average age 
between regions aviU normally be accompanied by a variation of 3 in 
suiciae rate. The other b coefficient in the estimating equation is inter- 
preted analogously, the figure to the riglit of the decimal point in the 
subscript indicating the factor that is lield constant. Of course, really 
to know the effect on suicides of age alone, we should hold constant all 
other fai^tors, not just per cent mal('. As we introduce more and more 
variables, this desirable situation is more and more closely appro.ximated. 
The constant (11.23 is the hypothetical value for suicide rate when the 
other factors considered have a value of zero. The estimate of suicide 
rate for any region is the sum of the net amounts associated with each 
independent variable plus the value for a. 

We might observe at this point that the natural scientist can often 
design his experiment so as to control a number of the variables, such, for 
instance, as temperature, humidity, or air pre/sure. The biologist and 
the agricultural experimenter can control their variables to a considerable 
extent. On the other hand, etronomicvS and sociology, and most of the 


* Technically, allowance is made for a variable by subtracting its effect on the other 
variables. Thus, if 

^■*1.2 JTfl 2'} 

2 ^ Xz Xcl 2 \ 

Xti t =* — Xcl.I*. 

X,2Z - Tc2.i; 


then 612.1 is the slope of on .r,2 3, and 613 2 is the slope of x,i 2 on x,z 2. Specifically: 


612 

6 1 3 


2 X1X2 

V-2 ’ 

^X, 


but 612.3 


Xxixt 


but 6 i 3.2 


^X,1 zXt2.i 

V_2 

■^^•2 3 

2 ^X#i. 2 X* 3.2 
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social sciences, j^enerally have to use the observational rather than the 
experimental rnothoch Since workers in these (ields usually have only 
very limited control, if any, over the material they must use, they must 
attempt to hold some of the variables constant statistically (rather than 
experimentally) by means of the techni(pies explained in this chapter.- 
As in previous instances, the total variation of the dependent series is 
the sum of two (|uantities: (I) the variation in the estimatc'd values of 
that series from their mean, and ( 2 ) the variation of the actual values 
from the estimated values, that is, 


V' • V 2 

-r 0 3 - 


The j)rocedure for computing measures of n'lationship is essentially the 
same as with simple correlation. The standard error of estimate is 





and the corffiartU of miiltipJe drier mi nation is 


^^* 1.23 


y >»- 

r\ 23 
. _ 


23 ‘States the proportion of total variation that is piescnl in the varia- 
tions of tlie comj)uted, o;' ji, values, and which has been explained by 
reference to the independent variables. The c(»eflicicnt of multiple cor- 
relation i'*^ the square roof of the coefficient of multiple determina- 

tion. R has no sign, sirna? tlie avSsociation may la' p(>siti\'e with one but 
negative with the other independent variable It is interesting to note 
at this point that, as additional associated independent variables are 
brought into a problem, Ri.n- m approaches 1.0 aiul approaches 

zero. If we w’crf? able to include all pertimait independent variables, 
A'i. 23 . n would b (3 1 . 0 , and we ('ould mak(‘ perfc'ct estimates of A’]. 

Partial correlation. \V(' ba\'e se(*n that the use of variable A^ 
lesulted in a certaiii amount of explained variation, indicated by 2 , 
but Sfirne of the variation in the dependent variable was not explained; 
tins was Ziuq’j Introilin’ing variable A\j, in addition to A" 2 , gave 
explained variation indicated by . 3 , w'hicli must exceed ^ if 

variable A^ j is germane to the problem. In any event, 2 x^ 1. 03 cannot be 
smaller than wuy, 9 . 


2 Another method, usually Tiot praetioal, is to select from the observed data observa- 
tions that have a (‘o^l,'^tant value with respect to all iruiepen<lciit vaiiable.s excei)t the 
one being stiidn'd. 
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Now, the amount of variation uriexplained by X2 was 2, but .Y3 
explained an additional amount of variation, indicated by “ 

2- if write 




V 


V^2 


we have ,, the eoefneient of partial determination. To put the above 
expression in words and to slate it more generally, we may say that the 
roeffieient of parlial (h'terminatioii is the ratio of: (1) ///e incrfcisr in ihr 
variation of the vompuUd valuvs of the deprtuUid variable resulting from the 
introduction of another independent variable to (2) tfLC variation that had not 
been explained In fore fhejntroduet/on of the new variable. 

Since 


th(. I s-^ion for / ]> . mav Ix^ written ui either of the two follow'ing \vays: 


r 


1 .L '1 




j 03 ; 




2 

rJ 


If the mimerator and dtunnuinator i>f the expivssion last givini are divided 
by 2.rj, w(* have 

•> , 2 o ^12 


In this form the eoeffieient of partial de1(Tmiin don may }>e regarded ns 
the ratio of: (1 } the* ini'rease in the proviuHon of \ ariation of the computed 
values of the deptuidiuit variable resulting from the introduction of 
another independimt. varialile to ^2) die proportion of the variation that 
had not been explained Ix'fon* the introduction of tlie new variable. 

The s(juare root of rf, j. the coefficient of parlial corndation and 
takes the sign of 613.2 in the estimating ecpiatio’^ The suoscript 13.2 for 
the ( ocdficient of partial correlation indicates, for our problem, that the 
eorrehation is between suicide rate, Xu and per <‘ent male, A'.i, when aver- 
age age .Y2 has been held eonstant at a value of A%. If we could pick out 
regions that are exactl}" alike* wi h resp(*el to age, the simple correlation 
betw'cen suicide rate and per cent male for those regions w'ould tend to 
be the same as the above eoeflicient of partial correlation. One purpose 
of partial (or net) correlation coefficients is to indicate the relative 
importance of the different independent variables in a problem in explain- 
ing variations in the dependent variable. 
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COMPUTATION PROCEDURE 

Computation of sums. Since this chapter will require a consider- 
able number of measures of relationship among; the four variables, it will 
be convenient to compute at one time all of the values that are needed 
in the different formulas. The original data for the four sorites, together 
with their sums and arithmetic means, are shown in liable 21.2. The 
individual squares and products and the sums of the squares and products 

TABLK 21.2 


Stiiriiir Rata, Average Age, Vt^r Cent Male, and lUisint^ss Failure 
Rate for 19 Regions of the United States, 19 $9 or 1950 


1 

j 

Suicide 

Average* 

1 Per cent 

Rusiiiesa- 

Region i 

rate 

ago 

male 

fiiilurt* rate 

i 

X, 

v, 

■ As 

A* 


12 i<r 

81 28 

48 ’ 7:1 

51 , fi3 

2 i 

12 02 

82. 43 

! 40 27 

43 55 

a i 

10. 10 

3t 50 

! 48 43 

00 73 

4 * 

12 (U 

;V2 79 

' 10 27 

20 25 

5 

10 Tid 

31 30 

; 40 25 

28 00 

0 ; 

li 07 

31 20 

10 11 

35 32 

7 

11 14 

30 10 

50 17 

21 08 

8 i 

11 .aO 

32 70 

i 10 58 

3:3 59 

9 ■ 

U 42 

30 80 

50 30 

20 01 

10 

12 47 i 

1 31 80 

! 40 14 

1 19 13 

1 1 

12 75 ! 

20 03 

! 50 55 

8 74 

12 

10 11 ! 

20 10 

1 40 74 

1 20 52 

1*' : 

0 28 ' i 

20 90 

; 40 10 

, 27 01 

14 - - 

y 15 i 

20 87 

! 40 80 

: 23 J2 

if) 

G 50 i 

25 00 

i 40 20 

22 71 

Iti 

8 25 ; 

20 (io 

50 12 

! 10 17 

17 

14 20 ! 

20 21 

j 51 40 

.30 00 

IS 

17 50 ! 

32 10 

i 50 02 

i 81 03 

ID > 

\) . OS 1 

27 00 

! 50 10 

I 15 

Total 

213 62 i 

572 7.5 

1 0 13 of 

020 08”^' 


11 2i;U.’i8 1 

30 144737 

' 40 0S2032 

! 32 035789 


A'-,, {j<T J00,(K)<J. 

Xt Mf'flian ugf* wliorr* a state coubtitutes n rt-giori . otlioru isr, the 
simp!** rnean of th<‘ state medians Ne\v Y(»rk, exeludiiig Xow ’^Y)rk 
City, eompiifed fr<»m tin' rj'lationship: 

Y ^ I t‘d y j,. i^r*. A Ft Hf *- ~ I'^ty -^It'tlrity* 

Xi. Failures p^‘r lU,0t30 hu.siness (‘uncerns, 

Ojitfi friiin p’jh)lu'atioii,-4 lirttrd ht-’ov^ , 

Population in tOoO- United States Department of roinmorei*. Bureau of the Census, 
CeT^llUH of the f'ntted lO'VJ. Vol I 

Suirtd^ raU in United tment of CoiuintTfe, Bureau of the Census, 

Vitul Sfu'u.ittt'e (if IkK f/Tii/fri .S7n/e/<, /.7^.9. Part II, Plan; of fir^idmer 
Prr cerU malt in 10', 0. United Statt'^ Hejiurtrnent of C«fiur»eri <* Bur<’uu of the Census, 
Sevtnretnth ('tnsus of IJ mit.d Slalta, lO'iO, V(A, If, Chartu leriMn 9 of Iht Population. 
Iiueiine%4 failurt rate, lOftO. Cnitetf States Dopnrlnient of Comiocrre, Bureau of the 
Census, ,Sfatif</iral Al/ntrarf of iht StaltH, JO.'/I, and l>un and Uradatre**!, Inr, 
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are shown in Tablo 21.3. From these avc obtain the sums of the squared 
deviations and the sums of the products of deviations. For e.^ample/ 


l'.r? = Z.Y'i - X.iiA',. 
Z.r: = :S-Y“ - 


2].ri.ri = i)A’i.V 2 - A'wA'j or S.ViA'o - A'.iiA'i. 

IV, .r, - Z.V.,V 3 - XiIiAaor IWi.V, - A^iiA'.. 

I sinj{ these, .I'ul similar formulas for the other sums. p:i\-es;^ 

2:/; - - (II.243l')S)(213.(i2) - 103 78. 

S.r] - 17.370.31 - (30.1 14737) (o72.7.'>)'- lOO.Ol. 

:i.rl ---■ lii,'«)7.3:> - (40.t)82()32i(013 07) -=■ 8.44. 

Sj-j -• 20,1 Ki.OO - (32.037)780) (020.08) -- f), 000 02. 

= 0,7)03 o4 — ( 1 1 .2 1317)8)(7)72.75) - 01.02. 

Zr,x-, - 10,021 88 - (11 2 13 17)8) (043.07) = 8.08. 

-.ri.r, = 7,103.10 - (1 1 .243 1 7)8) (020.08) - 431.80. 

Sj-s-rs - 28,44o..50 - (30. 14 1737.)(0 13.07) ----- -10.17. 

= 10,].')0 7,4 - (30, 144737) (020 08) --- 107.3'). 

:i:xjX4 - 30,731.28 - (40.082032) (020.08) -■= -77)03. 

Gros.s measures of relalion"liij). Simple eorrehitioii is ui realit.y 
gros.s correlation, since if-measures the relalionship hetueen t wo vai'iahle.s, 

^ Thr derivation of tliese entiations i.s fmil.v obvious. 

Zx: - Z{X, - .Y,)>, 

= Z(Xi - 2X,.\ , 4- .7(). 

- ii.vf - 2 .?, 2 ;.v, )- .vX'(, 

- ZXl - 2Y.2;.Y. 4 XiiiA,. 

= Z\] ^,zx... 

SxiXs il'.Vl - XiM'.V; - X-0, 


Si-V.A': 

.v,.v, 

.Y...V, 1 


ilA'i.V, - 



t A-X,A-,, 


x,2;a, 

2 ; v, 2; 

A, i;A,2;.v, 

2:a',y, - 

.\ 

• •' --‘A - " 

iil-ViA't - 

.r.-.v,. 




* In Tal>lf‘ 21.2 flif ohsprvnlifinH Hsuitlly Imvi* four .significant digits. Thereforo, 
th»‘ productH in 'r?d>!(‘ 21 . 'i arc u.mirIIa rtc-ur<I(d in five or six digits. N’cvcrtholoss, 
tin* \aliH\s -jIiou n lif-rn have only tlirf'c digits in two instanfos. Tlio v.'irious moa.sure*^ 
in this chapter coruf>iitf“d from th^sr* vnlucs ^*annot contain more than tlircc or four 
significant digits, and Kom(?tirncs only two f'r three. Mon* have })ceri recorded, how- 
ever, in order to affonl internal eh “"ks on t'ompnlalions and to contribute to the 
aeouracy of final results based on Lnl''rrnedijito coinputalions. 
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without any adjustment l)y correlation technique for the effects of other 
variables. Using the symbols developed in the irjtrodiictory section, we 
compute the following measures if we wish to correlate suicide rates Xi 
with average agci X2 alone: 


Estimating equation: 

Afi.2 ~ fi] 2 4 " 612A n or 
Normal rquations: 

I. ]LA 1 — Xn\ ‘z -f' or Ai ~ Oi 2 

a, 2 - Xi - /m.A_ 

II. S.ViA"^ ™ r/i oSA'o -)- bi2^Xi or Stit. - 



Total variation: 

^.1 j — A 1 - 1. 


Sum of squares of romputrd ralne.s and expUiinnl variation: 

i:x;. , - ai -f- - hu^.r,T2. 

(vSum of (’\pl/nn(*d srjuaresj (l*A'pIaiiU‘(l variation) 

I i nrxplaint d rnriaiion : 

^ Jl _ VV -’ V N ' V 

Standard rrror of ( sfi)i)nt*\ 




S ).2 ^ 


:a-; - SA':,., 


A’ 

Coefficient of correlation: 


or 




A' 


! V V V / V - 

AjwAi / w . r , 1,2 

^12 ^ \' 2 V' V* X' A/ X' 2 

^ j^A ^ ~ A\^ A I ^ -w J j 


The reader may already hav noticed that we lr,ve merely ^^et down the 
various equations and formulas used in simple correlation, but with 
slightly different symbols. 

Results of computations based on these expressions are given below, 
In order to avoid needless labor, the formulas shown on the right above, 
using deviations from means, are used. 
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Constants for estimating equation: 


bn 

Qi.* 


64.02 
109.91 ~ 
11.243 - 


+ 0.58248. 

(0.58248) (30. 144737) - -6.316. 


Estimating equation: 

A'a.2 = -6.3 16 + 0.5825.Y2. 
Xci.i — 4 ~ 0 . 582 . 5 ^ 2 . 

Total variation: 


SxJ = 2,505.54 - (il.2l3158)(213.G2) = 103.78. 
Explained variation: 

2.r,\«= (0.58248) (6 1.02) = 37.290. 
Unexplained variation: 

Sr.\.2 =>=• 103.78 - 37.290 = 66.190, 
Standard error of estimate: 


Su2 


66.490 

To 


3.499. 


a '|,2 — l.S^. 

Coefficient of correlation: 


r 


37.290 

103.78 


= 0.35932. 


n, = +0.5994. 


Following till* same procedure for variuOle 3, we obtain: 

5.3 =-- + 1 . 02844 ; 
a. 3 = —40.8.53; 

= 8.927; 

Xx^,i a = 94.853; 

8].} = 2.23; 
rli == 0.08602; 
r„ = +0.2933. 

Chart 21.1 shows scatter diagrams of the simple relationship between 
suicide rates and each of the independent variables being considered. 
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The correlation coefficients for these three relationships and the coeffi- 
cients of correlation between the three independent variables are: 

ri2 = +0.5994. = -0.3339. 

ris - +0.2933. r24 = +0.5800. 
ri4 = +0.5514. ru = -0.3400. 

It is intere.sting to note, at this point, that average age, Xt, showed the 
highest gross correlation with suicide rates, and that per cent male, X3, 



X, 



Xj 


Cha\ r 21,1, Scatter Diagrams 
of Suic'*tc Rate .Vi anti Each of 
the Three Independent Variables: 
,\>ei.ige Age X?, Per Cent Male 
Xs-* and Business Failure Rate Xt. 
Data from Table 21.2. 


showed the lowest. Later we shall see whether the independent variables 
retain the same rank in impor' nee when the v l of the other variables 
is removed. 

Two independent variables: multiple correlation. Naturally, we 
can expect to estimate suicide rates more accurately if we take two inde- 
pendent variables into consideration, rather than onl^^ one. Hence, let 
us make estimates from both average age and per cent male. The 
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estimating-equation type is 

A 01.23 ^*1.23 + ^>12.3A’2 + ^13.2-Y3, 

or, in terms of doviatioii«!, 

*f'rl.23 “ hi<2 iXl 4" 2-^’3- 

The l.*J3 subseripts after .V,. and a tell us that \v<» ar<‘ estimating vfdu(\s of 
Xi (sui<*ide rates) from variables A'> (av<'rag(' ag(‘) and A' - (pcT cent male). 
The first b ittdieates the normal change in sui<‘ide rates associated with a 
unit ehang(‘ in average age for regions that tfie saimi per cent male 

composition, the second h tells ns the normal change in siii<4de rates asso- 
ciated with a luiit ehaiig(' in pf*r cent male for n'gioiM of th(‘ same average 
ag(‘. 

'i'lie normal equal ions re(]uired are: 

I. IWi - An, 4- h,, .IW. 4 /o.i .w A,; 

11. iA lA 2 "= ui 2»^A o 4 bi> wA "t~ />!'} ^AA'^Ai; 

III. AiA. - ui 2.:^: A, 4 fn, ,^X2X, -t- .:i:At 

Considerable labor may lx? savexl if tlie normal ecjiiations art^ pnl in tiu’rns 
of deviatioiiS from the means. In this eas<\ tli(‘ first equation disappears, 
since wuq. and Zx', are each zero d'he remaining two eiiuations 

arc: 

IJ. Ixi.r. - -i- 

JU w.r.r.j ~ /qi> -f .r;. 

Making the required .substitutions, we have; 

11. 04.02 - 10‘Mn/;,2 { - ItklT/q, 0; 

IH S.OS - -10.]76i2 j 4 S.10/n2. 

Solving thes(> sirnultane()u-. ecjuations give>,: 

b\: - -f 0. < 0»20)7 

hn 2 ~ bl-*M / 11 

To get u.se lajuation 1, dividing if by .\ , ">ld.aining: 

X 1 ™ Ui 2 j "t ^^12 3 A 2 { //13 2 A ,3. 

Ui 2.; A ] ” //JO. ;A 2 ■" bi 5 -jA -j, 

- II 213I0S - (0.70207; i30. 1 U737) - ( 1.9471 1;( 19,082632), 

- -iOSoO 

1 he estimating ecpiation, then, is 

A’ci >.j - -lOH.oO 4 O.703A2 f 1 917A7,. 
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The explained variation 

o,; ~ 12.3 1*^2 T ^^3 

= (0.7(>2fu}(r>1.02) + (L9t7Uw8i)8), 
- n.") 730. 


Th(‘ other r]j(\ns\ire.s of iclationship are now f‘omput(Ml in a manner pre- 
eisely .similai to lliat (*mployec! wlmn there was only <jne indepcmdent 
^'ariahle. 


V '2 


Vr*' _ Vr- 

w./ 1 

103.78 - (>0.730 


38.(r>0. 


A' 


3S OoO 
10 


2 003; 


■>. - 1 12. 




h\ -• - 0.7038. 


o»,» 73< ) 
103 78 


0 r,333(): 


Since th<* cocl}h‘i<*!it (»f nnilt;j)h‘ d<‘t<n [jiination. /lA i*"’ 0.0331, \\f‘ 
hace ('vplaincti »13 per rnr>t ttir vaiiation j>re.s(mr in AA- Xoticc that 
/^J is t'r(‘ater tliati cither i\, or r\^, t'he value of r\^ \va^ [(.nuu! to he 
0 3303, while / ■- wa.-' 0 080)0 

Tiic slandard (aaor <»t ( .stimann Si 2., wa^ a>crrtained to h(' 1.12, whicli 
is smaller than tatlier l.s7 or .‘n 3 2 23, hAtimate.-^ mfule of Ah 

using th(' two indcjx'iid^ait \'aiial)I(*s A. and .Y , will he more sat i. '-fact or \ 
than estimat(‘s mad(‘ hy um* of i"ith(*r .V.^ or AA alone. Mon' .spccilicall v , 
the standar<l de\'ia1iofi of the Ai valm*.-' around ;h»- e.'-t nnatirig t't|uation 


A, 1 . 2.3 -- <i\ r luii^‘2 ‘f to-.jAi 


is les.s that) the -standard deviation of tlie Ah valne.s around 


or around 


A c\ ■» “ <^1.2 d‘ ^ijAA 
A cl i = ni..3 + bi:i\ ■) 


Two indepoiidcMit \ariahle‘^: (uirlial eorrolatioii. M hen only 
one independent variable (age) was i‘onsid(uvd, the ('xplaiiu'd variation 
was w.i7, - 37.200. Wlu'ii two Mide]>('ndt‘nt variah'(‘s (age atid per cent 

male) were us(*d, the (‘\])lained xariation was incnaised to w.rji — 
65.730. Therefore, the increase in the variation explaiiu'd by per cent 


Also, “ wA*j 23 — XiLAi, whore ILA*i 23 ” fliL't^Ai + •s'ltX ,X2 “h 

6-.3.:i:A.A3. 
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male is 

- Sx*,., = 65.730 - 37.290 = 28.440. 


After taking age alone into oonsitloration, the variation nmaining to be 
explained was 




== iV' - 

= 103.78 


2 

rl ‘2t 


- 37.290 = 00.400. 


The proportion of the variation previously unexplained, then, which was 
explained b}' including per cent male also, i.s the ratio 


‘JS.-tW 

60J90 


0.42773. 


As noted before, this ratio is known as the rotjllcirul of parhal ddermino- 
tiorij the square root of which is tin* coefficivnt of partial correlation. 
That is, 


2 _ 
^13,2 


V 2 


V 2 


O') 730 37,290 

00.490 


0 42773: 


7*13 2 “ -;h0.6540. 


The sign of this eoelficient of partial correlation is the same a^s the sign of 
t)i3.2 in the estimating equation. This coetlicient is a measure of the 
closeness of relationship between suicide rate and per cent male when age 
has been held constant statistically: it is the simple correlation coefficient 
which would be expected for regions of the same average ago. As pre- 
viously stated, if the inirnerator and denominator of the above expression 
for rjg 2 both divided by we obtain a formula sfiowing the rela- 
tionship between the partial determination coefficient and two gross 
determination coefficients. Thus, 


2 - r?2 

^ 3.2 = ^ ^2 ^ 

0.63336 -- 0.m32 
0.64068 

ri3.2 == +0.6540. 


0.42773. 


Note that each of the values recorded in this formtila is that in the pre- 
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ceding formula divided by 103.78 (in fact, this is the procedure we have 
already employed to obtain and ri 2 )* This formula may tlien be 
used as a check* on the final division needed to compute and r],. 
Also, it may be used when rfj is computed by some procedure other than 




r 1 9 — 


2x1 


or when the coefficients of determination, or coefficients of 


correlation, but not the original data, are given. 

As a companion measure to ri 3 . 2 , we should obtain the partial coeffi- 
cient ri 2 . 3 , which measures the relationship bed-ween suicide rate and age 
when per (.*ent male has been lield <*onsiant. This is done by finding the 
increase in the variation of the c(jmpuled values by using ag<; and per 
cent male in our estimating e(|uation rather than using per cent nial<* 
alone. Thus: 


'Zxl, ,, -- _ Gr..73U -- S!)27 

■■ Ul.HoS ' 

R \.,. - rr ., -- o . ok ()()2 

1 - r'U " 

- O.-'i'tSSo; 

"f- 0.7/ 39. 

Partial coeffhnon is, such as /U 3 2 and ri 2 3 , are often referred to asyir.s/- 
order coefficients, .since one variable hUvS txaui held constant. Simple 
coefficients are calUal zero-order coefficients, since no variables were held 
constant. Later in the (‘hapter, wf* shall consider ^^ 4 , rK<.i: 4 . and . 3 , 
which are acco^id-ordcr coefficients. Stated gencaliy, the order de.signa- 
tiou indicates the nuiniu'r of variables that have been held constant 
statistically. 

The gross correlation between suicide rate and age, r] 2 » it will be 
recalled, was +0.5994. Removing the effeci of variations in per cent 
male from both varial)l(‘s has imu’eased the relationship materially, since 
ri 2.3 = +0.7739. Similarly, ru, the gros.s correlation between suicide 
rate and per cent, male, was +0.2933, Hemoviug the effect of variations 
in age resulted in ^ 13.2 — +0.6540, again a decided increase. 

Relaticmship bel ween /U .23 and thev measures o4‘ gros.s ami partial 
correlation. The reader may ' surprised to tu>.e that Ab. 2 ,i is but 
0.7958 wherw’i 2 = +0.5994 and ri,-j -= -rO.2933. It is not a characteristic 
of these measures that the multiple coefficient is the sum of the two gross 


* Note, however, that there ia a tendenc> for the numerator and denonnnator to 
lose a significant dit^it hecause of the ilivision hy 
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coefficients. The relationship is more complex than that.^ It may be 
said, however, that, for given values of ri2 and ru having the same sign, 
Uie less the duplication in the independent variables (that is, the lower 
tficir positive or the higher their negative correlation; r23 in this case), the 
higher will be the multiple correlation. In the prescMii instance, r23 == 
— 0.3339, and hence the? addition of either age or per cent male materially 
improves the correlation over that obtained from the use of either 
independent variable alone. 

Neither is the multiple coefficient of correlation the sum of the two 
partial coefficients. However, there is an additive relationsliip (derived 
from the expressions just givtm for j and rjj J whicti may be writtefi in 
cither of two forms: 

= rU + rl,,(l -r?.,), 

- 0.35932 -f (0.42773) (1 - 0.3r)!)32) - 0.G334, or 

“ ^13 "b ^12.3(1 

== 0.08G02 + (0.59885) (I - 0.0cSG02) - ().()33t. 

It is iiitorosting to note the thought behind these (Miuations. The 
first one, for example, involves the vsurn of: (1) the proportion of variation 
explained by using one ind(‘per)dent variable and (2) the pr<;duct of (a) 
the proportion of variation nn(‘xplain(*d l)y that independtnit variable, 
1 ^i 2 j (b) the proportion of (a) explained as a result of using the 
other independent variable in addition to the first one, 2. 

Three independent variablcM: multiple correlation. In the pre- 
ceding paragraph.s. we ^’onsidered the two indepeiah^nt variables, avinage 
age, A'2, and per cent rnaie, X i. If we add a third imlep(nid(‘rit variabh*, 
business-failure rate, X 4 , we use an (estimating e<juati(Mi of the type 

A cl 234 <h 234 + ^02 34A2 4- lfli‘ 2 iX'i T //H.l'jAi, 

To ol)tain the, four consfanis, four normal (*qualions are requinsl if we use 
Ar-valu(‘s. Tliey arc,* 

I iA i -- Aai234 + 4- 61^242^-^3 -f 61423 NA 4; 

If. i A 1 A 2 — u 1 2 M i A 2 ■+■ 6 1 2 :i4 " A +• 6 1 3 24 i A 2 A 3 + 6 1 4 23 2^ A 2 A 4 ; 

111 = <ij2.<4NA3 4“ h\2,^4^X iX i 4~ 6n24NA‘| 4" 614,2 A 3A 4 ; 

IV. i:.ViA4 - U, 2oi;A4 T hn u2:X,X4 + 6,3 .m>:A'3A4 + hu,2,'^Xl 


' The rolalionship is us followH. 


Ill this case, 

y?i 7? 


4- <).()8G») 

n.Tnrja 


4- da 2fi:r i 3^21 
1 - 

l - 0 jTi5 


0 . 0333 . 
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However, by using x-values, we eliminate normal equation I, as before, 
giving 

II. XXiZ 2 "= b\ 2 .H^zl + 6i8.242x2iCj + 614.232X2X4; 

III. SXiXa ** 612.342X2X3 6 i 3.242 x 3 + 614.232X3X4) 

IV. 2)XiX 4 ■« 612.842X2X4 "b 613.242X3X4 + 614. 23 2X4. 

Substituting in normal equations II, III, and IV the sums of sfiuarod 
deviations and the sums of products of deviations, obtained earlier, we 
have 

II. 64.02 = 109 . 916 i 2.34 ” 10 . 176 u .24 + 467 . 396 u. 23 ; 

III. 8.68 - -10.176u.34 + 8.446u.24 - 75.93 6,4.23) 

IV. 431.80 - 467.396i2.34 - 75.936u.24 4 - 5,909.206u.23. 

Since the procedure for solving three simultaneous equations was given 
on pages 487-489, it will not be repeated here. The solution yields 

612.84 — -f-0. 53534; 

618.24 = 4*2.20484; 

614. 28 ~ 40.05906. 

If we write normal equation I in the form 

ai.234 = Xi — 612,84.^2 biB.iiXz — 6i4.23-J^4, 

we can substitute the values of the arithmetic mean.s from Tabic 21.1 and 
the 6-values just given, obtaining 

01.234 = 11.243158 - (0.53534) (30. 144737) - ( 2 . 20 - 184 ) ( 49 . 682632 ) 

- ( 0 . 05906 ) ( 32 . 635789 1 , 

« - 116 . 30 . 

The estimating equation, then, is 

Ar.i.2i4 = -116.36 4 0.536X2 + 2.205X. + 0.0591X4. 
Explained variation is 

2x*i 284 =* 6 i2.842XiX 2 4 6l3.242IXiX8 4 614.28SX1X4, 

= (0.53534) (64.02) 4- (2.^0484) (8.68) 4 (0.05906)(431.S0), 

» 78.913, 

and unexplained variation is 
2x; 

1.284 *“ Ixl - 2x|, 

.2347 

- 103.78 - 78.913 - 24.867. 
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We can now compute the standard error of estimate, which is 


5 i .234 




24.867 


19 


1 . 14 . 


The coefficient of multiple determination and thr coefficient of multiple 
correlation are 

78 . 9 ] 3 


/ei.234 - 0.8720. 


103.78 


Before proceeding to compute partial coefficients, it is desirable to see 
what improvement in our relationship has resulted from using variable 
Xi. Ii will he recalled that Rl, 2 z 0.(3334, indicating that we had 
explained 63 per cent of the variation in A’l by referring to X 2 and Xz. 
We have just found he 0.7(504. Now, by use of the three inde- 

pendent variables, we have explained 70 per cent of the variation in the 
dependent variable.^ Xot only does Rl 2 zi exceed 23 , but it is also 
larger than either A*f.o 4 or R'i 34 , Neither of these last two eoeffieients has 
been previously computed. They are 

Rl,^ - 0,4218 and Rl,, - 0.56r»4. 

It had been noted previously (page 513) that R] was larger than cither 
'*12 or rjj. Thf 3 reader ean verify ( 1 ) that Ri,iA exeeods either or ri 4 , 
and (2) that Ri 34 is larger than either rj., or 

As theValue of R- or R increases with the addition of appro])riate inde- 
pendent variables, the value of the standard error of eslimat (3 decreases. 
We previously found ni 23 to be 1.42; now we see that -si 2.14 — 114. The 
values of Sj .24 and .si 34 (neither of which was computed before) are each 
larger than .m 234 ; they are 

SL 2 i — 1.78 and 6 ’i .34 ~ 1.54. 

It IS clear that estimate.^ of suicide rates made from the use of all three of 
the independent variables will he more satisfactory than estimates made 
by using any tw*o of them. Stated more exactly, the standard deviation 
of the Xi values around the estimating erpiation 

A cl , 234 ” 234 4 * ^>12.34X2 "b 513.24A3 "b 514.23A4 

is smaller than the standard deviation of the A'l valiuvs around 
Nrl ,23 ~ ai .?3 + A12.3A 2 + 513.2A 3, 


* It must ha rern^^nibcred that a<lding anoth<*r independent variable causes the I08B 
of an additional degree of freedom. Thus, it may occasionally happen that the value 
of R* may be increased, h)ut the increase may not be .sigruficant. Testing the signiil' 
cance of partial and multiple coefficients of determination is discussed toward the end 
of Chapter 26 
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or around 
or around 


-^ 01.24 = Ul .24 + & 12 . 4^2 + 614.2X4, 
Xcl .34 = fll .34 613.4X3 + 614. sA 4. 


Three independent variables: partial correlation. 

the procedure previously used, 


• 14.23 


V.,,2 

' si?" 


23 


r 14.23 


78.013 - 65780 
10378 - 65730 
+0.5886. 


0.34647. 


Paralleling 


Since ri 4.23 = 0.3465, the use of independent variaVjIe A ’^4 enabled us to 
explain 35 per cent of the variation which X 2 and A''^ had failed to explain. 
The eim of rj 4 . 2 s is positive, to agree with the sign of 614 . 23 , and this coeffi- 
cient measures the relationship bet eon suicide rate Xi and business- 
failure rate A" 4 , when X 2 and X 3 have been held constant statistically. 
At a later point we shall obtain the valutas of ri 3.24 and ri 2 . 34 , which are, 
respective!}', measures of the correlation between variables Xi and Xz 
with X 2 and Xi held constant and between variables Xi and A''^ with A " 3 
and X 4 held constant. 

The value of r^ 4 .3 may also be ol)tained from the expression 


? _ 234 “ 

^14.23 - *"f - 

0.76039 - 0.633 36 
1 - 0.63336 
?*i4.23 — +0,5886. 


U. 34647. 


Four or more independent variables. Although the reader can 
probably supply the formula.s for multiple and partial correlation w+en 
more than three independent variables are to bo used, a set of generalized 
expressions may be helpful. 44ie formulas which follow are expansions 
of those already used; generalizations of certain formulas which have not 
yet been employed will be given at the appropriate later locations. For 
m variables, we have:® 


Estimating equation: 

Xcl.234...m ~ ai.234...m + 612 . 34 ... mX 2 + 6i3.24...mX3 + 6 i 4 . 2 a...mX 4 

+ • * • + 6im.2a. . . (m-r. Afn* 


• When there ttre four or more independent variables, it is advisable to use the 
Doolittle method (or some other systematic procedure) for the solution of the simul- 
taneous equations. The Doolittle method was described on pages 498-503. 
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Normal equations, in deviation form: 

I. fll.2a4.>.w — 6l2.84- -mX 2 — bi8.24 • • • m ^8 6l4.28 • ■ • m-^4 

— * * • *- ?)lm.28... 

II. 2X1X2 = &12,84. . .mSXj + 6l8.24 . ■ > m Sx’ 2 Xs 4- 614 2 g...,„Sx 2 X 4 

+ * ‘ ’ + f)lfn.23 . . . (m-n2x2Xm. 

Ur. SXiXj « 612.24. .. m2x2X8 + 6ia.24. ■ .m2x5 + 6l4.2a • > ■ m2x3X4 

+ ' * ■ + 6im.28. . . (m -l)SXjXm. 

IV. 2 xi.r 4 = 612.34 ... mSXaXi + 614 . 24 . . .rn 2 x 3 X 4 + 6u.23 mZxJ 

+ • * ■ + 6iw. 23. . . (m- l)2x4XTn. 


m. SXiXnj — 612.44 ... m2x2Xtn 6 l 8.24 • ■ . m2XsX,n H“ 614.28 •• mSXaXm 

t I h V/1.2 

~r ' ‘ ‘ -r f>lw.28-.- 


Explained variation: 


1.234' 'm 612.34 ><< Tn2XiX2 “f" 618 . 24.* 

Vnexplaincd variation: 


mSXjXa + 614.23. -mSXiXH 

+ ■ * * +6jm.23 . (tn \)^X\Xjn^ 


Zx 


2 

«1.234' • • m 


Zz\ -- Zxl 


d 234 • m- 


Standard error of estimate: 


S\ 234 ■ 



i'oejRnvnt of viullipk determination: 


E J.234 ■ • ' 


Zx] 

^13. 2(^ ^12) + ^I4 23(^ ■“ '^^1.23) + ' ’ ' 

+ ^L.23 “■ ^^1.234- '■(m-d))« 


Cor Jff denis of partial deferminalion: 


v .^2 y ,-2 

^x.\ o.ii . . . tn 


lm.23 • 


^•^#1,234 


y r* 

"■^rl.2,J4 -r 



Sx*., 


V -2 y .2 

234* •m ~ ^1.234- •• (m-1) 


1 — /^1.284* 
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2 __ ^^cl.2n4 • - • m 2x^134.,.^ 

^12.84' m ~ V 2 

‘ m 

23^^1.234 • ■ m ~ • m 

34 • ' • m 

/j>2 _ /J2 

- '^<-234 -Tn ^*M>34 ■•m 

MiiltSplo-partial oo*‘friri4*rilH. Just as the eoonieient of partial 
doterminiition ineasiues: (!) the* inereasn in the amount of variation of 
the eoinpuUid vahuiM of tlie (lepemhmt varinhle result in^i; from tlie intro- 
diu’.tion of another iiidepiuah'nt varialjle relative to (2) llie vaiiation 
wl\ich had lutl lu'en (‘xplaincd hcfoic tlic iutTodmdion of the now vari- 
able, so tlie multiple-paf tial coefficient of detormin/ition measures the 
relative increase resulting fioin the introduction ()f two or more new inde- 
pendent variables. For example, 


, 2 

• n34;.2 


- 


r'i 

^r1.2 


^1.234 


r 


2 

1* 


1 


r 


2 

12 


All of the values called for in these expressume have alreaciy been ob- 
tained, so wo may compute 


'r2 — V^2 

v> 2 


78.913 - 37.290 
103.78 ~ 37,290 

I - 


- 0.(>2f30. 


^12 


r2 

^12 


0.76039 - 0.36932 
1 - 0.36932 


-- 0.6260. 


JJie value of 2 tells us that, of the vaiiation in -Vi, which was not 
explained by A -., 63 per cent ha.s been explained by Xa and A’ i. As 
would be expected, rp<,4^ 2 laiger than eitliei 2 ” 0.4277 or rjf4 2 =*= 
0.0977. Note that the coefficient of multiple-partial coi relation (r,(ai).2 
~ 0.7912 in tiii.s instance) has no si^n, since tfio relationship between 
the de[)endcnt vatisble and each independent vaiiablc in parentheses 
may be either po.sitivc or nejrative. In tliis case, both are positive. 

I'lic relationship between l..e multiple con elation coefficient, the 
partial eorrclation coefficient, and the iiuiltiple-partial correlation 
coefficient may be understood more clearly if it i.s pointed out: (1) that 
R],'2.h is simple correlation of A't with A^ci.x.u; (2) that ri2. ,34 is simple cor- 
relation of with Xn 7 34, tliat is, sinijilc correlation of (Xi — 613 4X3 

— ^14. .1X4) with (Xz — ft23.4A'3 — b> 2 i .lA'i); and (3) t]iatrK23).4 is multi- 
ple correlation of Jji.i with Xb 2 .\ and 
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ANOTHER APPROACH TO MULTIPI.E AND PARTIAL 
CORRELATION COEFFICIENTS 

Sometimes one is presented with the results of a study which show only 
the zero-order correlation coefficients for a number of variables. If 
multiple and partial coefficients are wanted, it is possible to obtain them 
from the zero-order coefficients. The formulas which we shall use for the 
partial coefficients wdll also serve to indicate wliy partial correlation 
coefficients sometimes become larger and sometimes smaller as more 
variables are held constant. In the precfiding discussion we considered, 
first, multiple correlation coefficients and th('n partial coefficients. For 
the present treatment, it wall be advantageous to consider partial coeffi- 
cients first, since the multiple coefficients for four or more variables arc 
most conveniently obtained by using certain of the partial coefficients. 

First-order partial correlatioTi eoenicieiUs. Any first-order 
coefficient may be determined from the V7ilues of three zero-order coeffi- 
cients. For example, 


Tin. 2 


rn 

Vl - rli Vl - rl^ 


Since w'o shall compute eight of these first-order cc^effiricaits, and the 
reader may wish to ascindain the values of others, there are listed below 
all of the zero-order r, 1 — and a/i — values. VVe shall use some 
of the 1 — r- values for computing multiple coefficienls. 


ru 


4 0..-)0!»4; 

0 

ryi 

-= 0.3393; 

Tu 


+ 0.2933; 

u 

- 0.0800; 

ri4 


+0.r).'',14; 


0.3040; 

^'2'A 


-0.3330, 


- 0.1113; 

r2\ 


+O. 08 OO, 

^'24 

- 0.3304; 

r.vt 


- 0.3400. 


= 0.1130. 


^'1 
' 12 

- 0(3107; 

Vi - r\. 

= 0.8004; 


- 0.0140; 

Vl Vr=, 

- 0.9300; 


- 0.0900; 

Vl - 'rl. 

= 0.8343; 

0 

^23 

0.S885; 

vr - Vi, 

- 0.9420; 

^24 

- 0 . 00 : 10 ; 

V 1 - rl^ 

0.8140; 


- 0.8814. 

Vl ~ rL 

- 0.9404. 
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When four variables are involved in a correlation problem, there are 
twelve possible first-order coefficients. For onr purposes, we shall com- 
pute only eight of these: the six having Xi as the dependent variable and 
two others, ru.t and ru.i, which will ho used to obtain second-order partial 
coefficients. If our objei‘ti\e were merely to obtain the three second- 
order coefficients, shown in the next section, we would not need the last 
two of the six (irst-order coefficients having A'^i as the dependent variable. 


0.20.33 - (0..'i094)(-0..3339) 
(0.8004) (6.9120) 

(0..').')U) -- (0..3994)(0.r.800) 

’ (0.8004) (0.8140) 

Tj* -Juru _ (0.5.514) - (0.2933)^-^3400) 
Vr:: " (o79566)To'.9464) ’ 

-fO.724.3. 

_ (0.5994) - (0.^33)(--0.3.339) 

Vl - V'f --4 (0.9560) (0‘9426”) ^ 

= -(-0.7738. 

/•n - ri4rj4 __ (0.2933) - (0J)514)(- 0,3400) 
" \/r- ~ ' "7o'.8343)‘(0.9404) 

- - 10 . 6128 . 


7*13 — T l?^*23 

V 1 — r/o V 1 r-Zs 

~ Vi’ -6./7vr-7^ 


ru.z == 


'*12.3 


-1-0.6540. 
= 4-0.3125. 


rn~ruru (0.-5994) - (0..55 14) (0.5800) 

!.4 = --- - -v ■ = ... = -f0.41l4. 


ru.2 = 


V I - r{, Vl - r;-, (0.83431(0.8146) 

rr4 - r.',ir,i4 _ (0.5800) ;^(-0..3339)(-0.3;^t) 

'v"l - rt Vl - ~ ' " a).942u)‘(079404) 

4-0.5262. 

(-0.3400) - (-0.3339) (0.5800) 


V'l - Vl 


V4 


(0.9426) (0.8 146) 


= -0.1906. 


We ean now see why first-order coefficients are sometimes larger and 
sometimes smaller than zero-order coefficients. Consider three of the 


Proof that these formulas are the equivalent of those, we have been using is given 
in Appendix S, section 21.1. The labor of computation can be materially shortened 
if values of \/ 1 — r’ are looked up in J. R. Miner, Tables of \/ 1 — r® and 1 — for 
Use in Partial Correlation and Trigonometry, Johns Hopkins Press, Baltimore, 1936, 
or Truman Lee Kelley, The Kelley Statistical Tables^ revised edition, The Macmillan 
Company, 1918. 
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first-order cocffioients; ( 1 ) ri8.2 is larger than ru. Since and rjs have 
unlike signs, and is positive, the value of the numerator of the expres- 
sion for ru.2 i« larger than ru, I'he fact that the denominator is less than 
1.0 serves further to increase the result. ( 2 ) ri4.2 is smaller than ru. 
Since the product of ru and r24 does not exceed since rj2 and r^ have 
like signs, and since ru is positive, the value of the numerator of the 
expression for ru 2 is smaller than ru. Although the denominator is less 
than 1 . 0 , it was not enough smaller than 1.0 to increawse the result suffi- 
ciently to make it erjual or exceed ru. ( 3 ) r34.2 is smaller (that is, shows 
a lower degree of correlation) than Since the product of r28 and r24 
does not exceed r.^, since rja and r24 have unlike signs, and since r34 is 
negative, the value of the numerator in the expression for r34.2 is a smaller 
negative value than r-u. The denominator, thougli smaller than 1 . 0 , was 
not small enough to increase the result to a point where it would equal or 
exceed ra4. 

Second-order partial correlation cocnicicnts. Second-order coeffi- 
cients may be obtained from first-order coeffici(MitwS. We shall compute 
onl}^ those second-order coefficients Imving Xi as the clependent variable. 
They are ; 


^ 14.23 


?’l 8.24 


^ 12.34 


^ ^ J.0-3125) ~ (Q.r)r)4 0)(-0.1006) 

Vl - r'U,i V^l - 

= + 0 . 5887 . 

ru.; - - (0.3 r2r.)(- 0.1000) 

V'' V VV- i61[2^y- Vr~- o)Ti ooo)'^ 

=■ -f- 0 . 7 <; 5 :i. 


ri 2.3 ““ 7 * 14 . 3 ^ 24.3 


Vl - rU,, ViV 


_(07738) -_(0 7L''i:i!+ .'20*2) 
\/ 1 - (0.72 \ "l - 


(0 .V 2 (, 2 )‘^ 

= 1-0.0697. 


Alternative formulas, giving the same results, arc avai!al)!e for all three 
of the second-order coeflicients. They are: 


7*14 3 •“ 7*i2.3r24.n 


Vi 

7 ^ 12.3 ^ 1 ^ 

r'Li 


ria .4 — rn. 4 r 23.4 


Vi 

1 

< 

1 

7 ^ 23.4 


ri 2.4 - ri 8 . 4 r 28.4 


VT 

- r;... Vl - 

^ 2 S .4 
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Notice that r^.is is larger than ri4,2. On the other hand, ri4.23 is smaller 
than ri4.8. Similar comparisons may be made between the other second- 
order coefficients and the appropriate first-order coefficients. 

An expressions^ for is 


rim. 23 ... (m-l) “ 


/ lm_.2.T.j_^(m-2) ~~ y^Um~l).23 ■ ■ ■ (m~2)^m(m-l).2S . • • (m-^) 
\/l ““ • • • (m-2) 


It is interesting to pause at this point and inspect some of the results of 
our computations. Ii<'lo\v are shown the zero-order, first-order, and 
second-order cooffirnents involving A'l as the depeiuhnit variable: 

ri 2 = +0.5994. rn.., - +0.77HS. ri 2.34 - +0.GG97. 

ri 2.4 = +0.4114. 

Tia = +0.2933. ri3 2 = +0.G540. ri3.24 = +0.7G53. 

ria.4 = 4-0.6128. 

rn «= +0.5514. ri4.2 - +0.3125, ^ 4.23 - +0.5887. 

ri 4.3 = +0.7243. 

When no ailowan.ee had boon made bir the effeet of other variables, Xq 
(average age) rraiked fir.st and Xz (per cent male) ranked last. When 
adjustment was made for A' 4 , per rent male Xz ranked ahead of average 
jige wYs; when adjustment wa.s made for Xz, avcTagc age A'': was ahead of 
business-failure rate A 4 ; when adjustment w'as made for .Y 2 , per cent 
male Xz ranked above business-failure rate A^^. Finally, when two 
iridependont variahle.s \vf‘re held constant, per cent male Xz was first and 
business-failure rate A ^4 was last. 

Multiple cocnicientH. It iias already !>een pointed out in footnote 7 
that throe-variable multiple coeffiedents may be obtained from the zero- 
order coefficients, 'rhus: 

1.23 - ■ '”2 ^ 

' 23 

0.3593 + 0.08 0 0 - 2(0.5994 h0.2933)(- 0. 3339) 

0.8885 ' 


=■ 0.6333. 
fti.23 = 0.7958. 


Other formH may also bo written. However, this is the most logical form, since 
partial eoeflieients are being ))uilt up from those of lower order, using in turn variables 
A:, Xzf A 4 , . . . , Xm. It would be possible to drop from the Hub.srrij^t of the first r 
in the numerator, not (m — 1 ), as was done here, but any subscript other than 1 or ra. 
For example, if 3 were dropped, the three coetliciciita would have as subseripts; 

Im.ai !S.«4 ■ “(m I) ) Afui mS 24 • ' • (m-l). 
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»'ij + rli - 2rijrurs« 

= T — ' 

1 — r^A 

_ 0.3593 + 0.3040 ~ 2(0.599 4) (0.5514 ) (O.riSOOy 
0.6636 ' ' 

= 0 . 4218 , 

^1.24 “ 0.6495. 

02 _ ''13 + ^14 — 2 rpri 4 r 34 

A ”” '34 

_ 0.0860 + 0^040 -•2(0.2933)(0.r)5l4)( -0.3400) 

0.8844 

= 0 . 5654 . 

/?x .34 - 0 . 7519 . 

From the general formula, given on page 550 , we may write the follow- 
ing, the first one of which was used on pag(' 54 (i: 

''12 + r;,.,a - rh) =* 0*3593 ■+ (0.4277) (0.6-107) - 0.6333. 
jf^i.23 “ 0./958. 

^!l 24 == + r?4.,(l - r^>) 0.3593 -j- (0.0977)(().r>407) - 0.4219. 

/?i.24 = 0.6495. 

Rl.zi = r\, 4- rUA^ ^ 0.0860 -f ((}.5246)(0.9140) - 0.5655. 

i?i.34 - 0.7520. 

/fj ,234 ~ r p 2 'h r 13 . 0(1 ^ 12 ) 4 ''*11 2 .d i ^ 1 1 . 2 : 1 )’ 

= 0.3593 4- (0.4277) (0.6 107) 4* (0.3466)(0.3667), 

= 0.7604. 

Ri.2Zi ““ 0,8720. 

Rearranging the formula for given on page 544 , \vc may also write 

1 ■“ ^1.23 ~ (i '*12) (1 ~ '‘13.2)- 

R1.2Z “ 1 ^ 12 ) (1 ~ '"la.'i)]- 

This expression may be put into a general form for m variable.^ by writing 

^1.234 •• - m “ 

1 — [(1 — ri2)(l ''l3.2)(l '* 14 . 23 ) ’ ■ * (1 '’lm.2S • • ■ (m— 1))]* 

A variation of this expression is 

tn *=* 1 [(1 ^*^1.234 • • • (m--n)(l '*lm. 23 • • • (m -1 >)]• 
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Coefficients of estimation and standard errors of estimate. 

When only the values of the zero-order coefficients are known, it is not 
feasible to undertake to ascertain the various 5- values and the standard 
error of estimate. However, if si, or and N, are known, we can obtain 
the standard error of estimate from 

5i.234. -m ” ■Si \^\ ^1/2.14 - ' m* 

To compute the coefR<*ients of estimation requires a knov'Iedge of other 
standard errors of estimate. Thus, 


Olm.23 • . (w n 

1 23 • ■ • (m— Ij 

OTHER MEASIHES OF THE INDIVIDUAL IMPORTANCE 
OF THE INDEPENDENT VARIABLES 

We have already considcied the coefficients of partial detenni nation or 
correlation as measures of the individual iniportanco of the three inde- 
pendent variables. Two other measures of the individual importance 
of the independent variables are 0 (‘casionally used. 

Beta ooefficieiit s. It vv ill be remembered that the following relation- 
sliip was used in simple correlation: 


ri2 --- 6i 


Sj 
' Si 


The beta coefficienls are akin to this exprc'^sion, being uritten 


S2 

^12 Si - 3 i I 

1-^ 13.24 ^ bi 3 v» and 

dll 23 ~ hu 23 


The reader should not confuse these measures with di Jmd (S- used to 
describe a frecpicncy ilistribution. The two sets of measuies are entirely 
different in nature. 

For purposes of computation, we shall write 
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and similarly for the other ratios, giving 


/9u.n “ hu.M 


4-0.53634 , 


109.01 

103.78 


40.'b509. 


fili J4 “= bii.i 


+2.20484 


r 8.44 
V 103.7 


+0.6288. 


/3u - 6,4 M 


' +0.0590G = +0.4467. 

The ranks of the three 0 coefricicnts are the same, for. our problem, 
a.s were the ranks of the corresponding partial coefficients. This will 
usually, although not always, be the case. 

The expression for /3im 23 ...(m-i) may be written: 


. • (m— 1) ®* htm,22 ’ ■ ‘ (m— 1) 


Coefficients of separate determination* If the expression 

V 2 

02 _ '^^cl.2S4 

1.234 =“ ' 

wXj 

_ fti 2 .a 42 )xiX 2 + 6tj 24SX1X3 + 614.23^X1X4 
= . 

be diviiled into three parts, designated djj 34 , di 3 . 24 f *^nd dj 4 , 2 Sr so that 


^12.34 “ 


At "iUL 

«14.2* — « 


612 34 2^XjX2 

2:xl ’ 

(0^3_534)(^.02) 

103^^78 

613 nXxiXt 

(2.20484) (8.68) ^ 
' 103.78 

614 . 23 SX 1 X 4 


0.33024; 


0.18441; and 


^ 0.24573; 

1C3.78 
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Vft have three coefficients of separate determination^ which, when added, 
give 72*. That is, 

^7i. 234 == ^I2.a4 ^13.24 + ^14.23* 

0.7(30‘J - 0.33024 + 0.18441 + 0.24573. 

The exprerifcion for is 

,2 r \Xtn 

“lm.23 ‘ • (m-1) “ V^2 

I 

Although the d* values may he added to produce they hnve several 
Bhortcoiniiigs, one of whudi is tiiat they are believed to be more subject 
to sampling error than eithei the partial coelficienls or tlie beta coeffi- 
cients. Fill tluM'inore, the* d* values mcasuie not only tlie dcttirinination 
attiibutable to tlie inflcpendent vaiiahli* to the left of the decimal in the 
subset i])t, but also a ])ostior» of the joint deterinjii.'ition of the other inde- 
pendent variables. 

It may bo of intcKvst that dp> ,4 - 6 - and that similar expres- 
sions may be written f(»i other coefficients of separate determination. 

.MI LTIPLE Cl inil.lNKAR CORRELATION 

As in the case of rclaliimslups between two varrablcs, the relationship 
between a dcipondcuit variable and one or more independent variables is 
sometimes non-linear. Whmi this is true, wx' may use a polynomial or \ve 
may transform one or more variald(‘s into logarithms, reciprocals, roots, 
or iiowcrs, or eonvcrl in some other manner. 

Polynomials, If the relationship between A^i and Xi appears to be 
non-linear, wliile that between Xi and is linear, the ecpiation type 

A cl 22'3 “ 0.12^6 4“ ^h2,2'3X2 + bij ;.iA J T 5l3.22'A’^3 

might be used. This ecpnition Avould, presumably, result in a greater 
amount of explained variation than would use of 

AcI.23 = <^1.23 + bi2.dX2 + {> 13 . 2 ^ 3 . 

The increase in the amount of e.xplained variation may be tested for 
significance by using the methods describiHi for partial coefficients of 
determination in (4iapter 2b. A polynomial was used for a non-linear 
multiple correlation analysis on pages 771) 781 of the lirst edition of this 
text. 

TruiiHfonnalioiis. Using logarithms, reciprocals, roots, powers, or 
some other function of the values of one (or more) of the scries may result 

** For a di.siMi.ssion of these and other poinlH, see M. Kzckicl, Methods of Correlation 
Analysis, Jolm Wiley and Sonw, New York, 1941, Bccond edition, pp. 498-500. 
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in reducing a non-linear relationship to linear form. For example, an 
estimating equation might be of one of the following types; 

-Vcl.23 = ni.23 4“ hio ,3 log A'^2 + f>iz 2^3] 

ATfi.aa = 01,23 4~ ^ 12 . 3 X 2 4" ^^13.2 ^ A 3; 

Arcl.23 == O 1.23 4- hi2.3 ,v -f*- ^>13.2^ 3 ; 

A 2 

log Arfi.23 ~ O1.23 4" ?>12.3A’"2 4" ^>13.2^3. 

Various combinations are also possible. When using a transformation, 
one should, if possible, formulate a hypothesis concerning the nature of 
the relationship betwemi the variables, as was done in the case of the 

(Vy), = « + hx 

transformation employed for the data of pornlerosa pine trees in (Chapter 

20 . 

Graphic Method. Statisticians in (he I'nited States Department of 
Agriculture have developed an extremely' tlexihlc t(H'}ini(]ue by which 
curves of net relatioinship snd a coefficient of multiple <*orrelalion may be 
obtained through successive approximations by nu'ans of ( harts and use 
of raathemati(‘s no more adv^inc^'d than simple arithmetit', Wliiki this 
method has distinct limitations, it is useful as an CNploratory tool in 
determining the appropriate typo of equation to fit by mathematical 
methods. 

Although the graphic method is extremely flexii)le, it is also highly 
subjective' Rarely would two statisticians obtain curves exactly alike 
from the same data. (\)nsequently, good results can he obtained onl 3 '' 1 ) 3 ' 
persons of experience and good judgment. 4’his is in contrast to the 
mathematical procedure based on the method of least sejuares, in w4uch 
case (barring mistakes) only one po.ssibh' rc.sult can he had for a giveri 
equation type. A practical difficulty is also inhere^ivt in the graphic 
method when a large number of variables is crnploytal. 'riie graphic 
approach is not explained in this edition of this text, but the interested 
reader is referred to pages 784 789 of the first (^lition. 



Sytuhiils I in (Jnipler 22 


a: the value of Yr when .Y — f) in iliv a f }fX 

(I1.23: value of AYi 23 when X2 -= 0 aiul -- () \n tiie (^stinu^^ iiij^ cciuation 

A cl. 23 ~ 23 + hl2.^X2 f" -'A 3. 

<*2.13: value of Xc2.n when AA - D ami A’ j -- 0 tn the (“-lima! ing (‘fiinition 

A"c 2.13 “ Uo 13 + 621. 'lAA -f ^jri.lAj. 

b: coeiheiout of X in the e(|u«ation lA -- (i X bX 

bu.z'- coefFieicnt of AA) in the e‘^:tini:i(inij: ec^’inlion si;<e,sii afjo'^e for a,, 23. 

O13.2* coelheifnit of X^ in the estiniat in}; f‘(|ualiMn :o {},i‘ for 

/>2i. 3* cooflieient of AA in Ihe enfimaiim; (^(uatnjii shov, n :Uio\<‘ for re- 1,. 

1)23.1’ coefficient of AA^ isi ih(‘ iinarini^ (‘qualiun ^ho\\n aOo've fen’ j„ 

iV: the number of pair'5 of ileins for two-vuriabln ‘^‘orrelut ion . rlu', number 
ol sets of ibiu.s for muilijile afiJ [lavtin! <'.orrela1ion. 
r: coefficient- of (‘orniation. riu ''23 aic* eociheients i(‘f« vriu};, M‘>pec- 

tively, to A"i and A'^^, to AA and A'.., ami to Y-. and A’.i 
f'i'j 3: coelf'uicMit of jiartial eorrelation. iho values of A^'i btin}.i, lu'id eonj^tiru. 
.s,: the standard deviation of the .>* values 
Syi the standard deviation of the ?/ value.'*. 

2: upper-case (Ircek sigma, meaning “take t’ne sum of,” 

.v: deviation of an X v’alue from the tnu'sl hm' for the V wilia'.s, 

X: the A" series, also an observed value in tie' A scuic'.' \\f' refeu’ 

to correlating A" and but YA" mean'' “sum the calue.s in tlie A^ 
series.’’ 

A"i: the AA scales; also, an ob.scrved \uilue in the VA scuic^.s 'i'hn.s, 'Ae 
refer to correlating A"i with A’'-^ or witli A".,, or with both ,\A and A\j. l»ut. 
SAA mc'Hiis “sum the values in Uu' AA -M-nes." 

A^2, A''^: respectively, the AA senc's and the Aj scri^^s; also, o’h auw ed \'aluc'^ 
ill tho.se series. See A^^i. 

A'ci.23: a ('omputed value of the AA sc'vies wlc'u the estimating rcpiation 
shown above for a 1.23 i« u.sed. 

Arc2.i3i a computed value of the AA seric^s when the estiniatmg cciuation 
shown above for o-j 13 is used. 

y: deviation of a V value from be tremd line for ihe )' \udues. 

Y: the Y series; also, an ol)served value in the Y .series Thu.^, wo vehu' 
to correlating and F, but ZF means ‘'sum tlie values in the Y 
series/’ 

Yoi a computed value of the Y series. 
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CHAPTER 22 

Correlation IV: Correlation of Time 

Series 


The problem of correlating the cyclical fluctualioriK of two, or more, 
time .series is basically the same as that of correlating non-chronological 
series. However, when correlating time series, we must take cognizance 
of the fact that trend is usually present in annual data and that both 
trend and seasonal variation, as wadi as irregular liiKduations, may be 
found in monthly data. 


ANNUAL DATA 

Table 22.1 show's data of the production of by-product (or oven) coke 
and of beehive^ coke in the l'r)it<‘(I States for each year, HMl through 1952. 
From the^ numerical data, little can be grasped concerning the behavior 
of the tw’o series; but when the tw^o .serie.s are show n graphieally in Uhart.s 
22.1 and 22.2, it is appanuit that: (1) the trend of by-product coke pro- 
duction i.s upward, (2) the trend of becdiive (‘oke production is dc)vvnw'ard, 
and (3) the JIucluntions of the tw'o series are positiv(‘ly eorrolatcd. 

Correlation of data unadjusted for trend. When correlating two 
time serie.s, we are inttue.sted in knowing whether the fiuetuation.s of th(j 
series move in the same direction or in oj)posit(‘ din^ctioris, and w'hethcr 
the association is high or low. If our concern is with the trends of the 
tw’o series, rather than with th(‘ir fhu-tuations, we wumld not correlate the 
two trond.s, .since they would of necessity show perfect linear or non- 
linear correlation, dYonds are compan‘d eith(‘r graphi<ailly or by 
examining the trend equations. When time series data, unadjusted for 
trend, are correlated, tlie resulting coefficient reflects both the relationship 
existing beUveen the fluctuations and that between the two trends. The 
data of production of by-product and beehive coke are shown as ascatter 
diagram in Chart 22.3 and the value of the correlation coefficient is found, 
in Table 22,1, to be d-O. loG. This coefficient seems low^ in view of the 
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agreement of the fluctuations of the two series shown in Charts 22.1 and 
22.2. The difficulty lies in the fa(;t that the two trends are in opposite 
directions. The effect of trend may be eliminated by correlating per^ 
cenlages of trend instead of correlaf iiig the raw data. Alternatively, we 
may (*oinpute the partial corn'lation coefficient ri:.;,, where the two scTies 
are Xi and and where time is Xz. yornetinies the effect of trend is 


TABLE 22.1 

Correlation of Production of Hy-protiuct Coke and of Beehive (Uike, in the 

United State^f^ 19 f I- 1*^52 


Year 


1941 

1942 

1944 

1945 

1946 
1047 
1948 
19-19 

1950 

1951 

1952 
Total 


By-product j 
coke I 

product ion ! 

Y ! 

58 ; 482 
62,295 
r>;L74;{ 
67,065 
62,094 
53.929 
66 , 759 
68,281 
60,222 
66.891 
71,990 
63,631 
” 765' 385 


(ThoiisHUiU of ton.s) 

Bofhiv^o 
coke 


I 


production 

Y 

6,704 
8.274 
7,933 
6,973 
5,214 
4 , 5()8 
6,687 
6.578 
3.415 
5,827 
7,313 
4,601 
71 iVt 


XY 


I 




392 . 

063 

,328 i 

3 .I 2 O, 

111 

321 ' 

14 

943, 

,616 

515. 

128 

,830 ' 

3 , 8S0 , 

667, 

025 

6S, 

459 

,076 

50.5. 

673 

,219 : 

1.06.3 

170, 

,049 

62 

932 

, 189 

167, 

64 1 

215 . 

4.197, 

71 1 

225 1 

48, 

622 

,729 

323 , 

758 

.116 ■' 

3 8.55, 

(36 1 , 

, 836 

27 

1 85 ^ 

, 796 

216, 

317 

,f;72 

2.908, 

337 

01 1 ; 

20 

86 1 > 

624 

116, 

117 

, 133 ' 

4 . 156 . 

761, 

,nsi : 

11 

715 

,9(*.9 

119. 

172 

.152 

1 . 662 

701, 

,656 

43 

270 

,084 

205 . 

658 

.130 

3,t)26, 

6.89, 

,284 . 

11 

(»62 

225 

389 . 

773, 

. 857 1 

4, 171, 

105, 

,881 

33 

953 

.929 

528 . 

622. 

, 570 ' 

5. 1S2. 

5f)0, 

joo : 

.53 

'i|9 

.619 

292 

76<» 

231 

1.018 

901 , 

16] ' 

21 

169 

201 

. 7t)3 . 

325. 

.783 ' 

19.077, 

72,5, 

, 6()'> 

481 

701 

3S7 



\ I» 707, 'iiul 

I 

) 70), 

uiu] 

from 

i;. s. 

■I’, !• .* 

Ilf*-. 


. ' Civfirs', 10 

.*>.’) hu‘i 

iiniaJ 

Kditn/n, p. 


r)t;paitiin*nt of C’oijiiinM I'o, of liosjno"-' )■ < 

0 ) 70. 

r - - , 

\/\NZx^ - (2:\)2)iA'L'r* - (r.}')-6 

^ _ 12(1.763,325,783) -- (,76,5. ‘185 n 71 , 1 17) 

V'112(49, 077,725.663) - i7i)5,3S5)=||r2(181,701.387) - (74,117)- 
« f 0.156. 


decrea.sed by correlating father (1) tlie amounts of change from each, year 
to the next for the two series or (2) the perccifijigcs of change from each 
year to the next for the two series. \Vi* shall examin:^ each of these pro- 
eedures in turn. 

Correlation of percentages of trend. Obviously, the first step 
consists of determining an appropriate trend for each of the serif^s. For 
^ur illustration, linear trends will suffice, and Table 22.2 shows tlie com- 
putation of the trend equation, the trend values, and the percentages of 
trend for by-product coke. Similar computations arc shown for beehive 
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tillinrt 22.1. Prodnolion of -proilurl ( oko in llu- I niioM Slat**^ iind Strai(;hl> 
Lino 'In inl, 191[-i952. Daf.i Ji.'iii L’li 2 


millions 0^' 
SHORT TONS 


('hart 22.2. Production of Beehive C'oUc in the* United StntOH utitl Slrnightp ^ 
Line Trend, 1941-1 952. Data from 'Uilde 22 II 




nYpRoc;i;or coke 


Jiliar» 22..*}. .Sratlrr l)i;»«rans of l*ro<iu<“lioii i>r t aiul tri Urrli.vc 

Coke, 19 H l9o2. Jk'tM of V,\Wu 22.1. 

rVHLi: 22.2 

Dvtrrtninatii^n of Trenil aitd (lotuputation of Vvt-ilcnt-oJ-'l'rentl 


Vixl 

nos f<*r 

Proilnf'tinn 

of ll\ -proiliii't 

(oho, 19 U 

J9>2 



ProdiH‘tit)n 


4’ioTid 

I’or (-(Jilt 

Year , 

A 

(OOOshoit ; 

A'r 

Viiluos 

of tifMul 



}' 


)'c 

, i)‘ 

1911 ' 

„ 11 

58 , 482 ■ 

-048.802 ■ 

60,045 8 

90 48 

1912 ; 

9 

t;2,2<).5 > 

— 500 . 055 

()i ,215 0 

101 70 

19‘i;i 

— 7 

08 718 : 

4 10,201 ; 

01 rs5 9 

108 17 

1911 

5 

(17 005 

-885,825 ' 

02 , 850) . 2 

107 55 

J 9-15 

- 5 

02.091 ; 

- 1 80 . 282 ! 

02,920 0 

98 0.8 

1919 

-- 1 

58.929 ! 

- ^ 58 . 929 ; 

08.490 9 

84 98 

1917 

1 

0>f> , 759 

00 ,759 j 

04,0(i7 2 

104.20 

J 94S , 

;i 

t *S , 2S 1 

204 , 852 

04,0.87 0 

105 04 

1919 : 

5 

(■•0,222 , 

801,110 , 

05,207 9 

92 85 

1950 ; 

i 

00 89 1 

lli8 , 28 i 

05,778 2 

101 09 

1951 1 

9 

7 1 . 990 ; 

017,910 

00.818 0 

108 50 

1952 1 

1 I 

08.081 

099,911 ; 

00,918 9 

, 95 09 

Total i 

0 

’ 705.885 i 

108,115 , 



Da til fioin 

fcioiirfOs ']’»)•!< 

* 22 1 





s --- 12 

= - 2^280) - 5; 

2. 
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TABLE 22.,^ 


Determination of Trend and Computation of f^er^Cent •of -Trend 
Values for Production of Beehive Coke, 19iV-l952 


Year | 

X 

: Production 
, UKX) .short 
tons'i 
' }’ 

i 

1 .YV 

j Trend 
: valiie.s 

i Vc 

1 T\*r cent 
i of trend 

1 [Y Yr 

1041 , 

-^11 

i 

~7;i.74l 

' 7 , 2S8 0 

01 98 

1042 

- !) 

S,27t 

- 74 , 4fHi 

; 7,08(1 1 

no 70 


- 7 

7 \m 


1 0,88-1 2 

i 115 23 

1041 i 

— 5 

G,07:i 

; -34.805 

■ 0 . 682 0 

104. 35 

1045 

~ ;i 

.5.211 

; -IT) (112 

6 , 470 7 

80 17 

1040 

— 1 

1 . 5()8 

~ 1 . 5(18 

0.277 5 

72 77 

1017 i 

1 

(>.liS7 

! 0 , 0S7 

i 0.(^75 3 

no 07 

i04>< ' 

n 

r» . 57S 

; 10,731 

; 5.S7:i 1 

; 112 00 

1010 

5 

5.115 

17,075 

.5 (mO 0 

i ()0 22 

1050 

i 

5 , S27 

10,780 

1 5,408 7 

100.55 

10.51 

0 

; 7.:m:i 

! 00 , 087 

i 5 , 2t)() 5 

, 130.43 

1052 , 

11 

l.GOl 

1 50.011 

1 .5,01)1 2 

; 00 S5 


0 

' 74.117 

1 -."iT.s;};! 


i 

Dftta fr<»rii 

so'ircf’H p-ivcM) Tahio 2'2 I 





A' = 12 

- 2f28(P - 

572. 




ZY 7 

a - 

1.117 

- 0,1/ 

12 

r».42 




^ == V . - 

-57,833 

- 10 1.107. 



Y, - G.17(*). 12 - 101. 107 A. 

()ri;i:in, belwocn 11»U» and 1017. 
A’ 1 vrar. 



1941 ’42 ’43 ’44 ’45 ’46 ’47 '48 *49 ’50 ’51 1952 

Chart 22.4. Percentages of J’rend of B> -product f-oke and of Beehive Coke 
Production, 1941 1952. Data of Tablc.s 22.2 and 22.ib 
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coke in Table 22,3. The two sets of per-cent-of-trend data have been 
plotted in Chart 22.4, where it may be seen that whenever one series is 
above (or below) its trend line, the other series is also above (or below) its 
trend line. Chart 22.4 gives us no adequate picture of the closen(\ss of 
the relationship; that purpose is served by Chart 22.5, w'hich is a scatter 

BEEHJVE COKE 
PER CENT OF TREND 



BYPRODUCT COKE, PER CENT OF TREND 

('.hart 22. .1. Sralter IHa^raiii of of rmiU i)f Proihir- 

lioii of U>-pr«Klurt (loke aiiU of Beehive ( oke, 1941 -1952. Data of 
Ta])Ic 22.l.‘ 

dingrani of the two scrie.s of ])crcentages of trend. From this scatter plot 
it is clear (hat fairly high positive (‘orn^lation is present between the per- 
centages of trend for tht‘ two seri(‘s, and the value of r is found, in Table 
22.4, to bo +0.83S. 

Tlie situation pictured in the foregoing tables and charts is but one of 
four possibilities.^ They are: 

1. The fluctuations of two time serh's may be positively correlated, 
but the trends may be in opposite directions. Correlating the data with- 
out adjusting for trend, instea.- of correlating percentages of trend, will 

* 1 Throughout the diseussiou in this chapter, we consider only linear trends and 
linear correlation. When dealing with non-linear trends and/or non-linear correla- 
tion of fluctuations, tlie results of failing to eliminate trend cannot be so simply stated 
as when only lim'ar relationsliips are involved. However, if a trend is non-linear, 
it is just as imnortant that its effect be eliminated as if the trend were linear. 
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result in lowering the positive correlation coefficient or may even change 
it to a negative coefTicMont, if the trends are marked in relation to the 
fluctuations. In the preceding illustration, r == +0.838 for the per-cent- 
of-trend (Lata, while r = +0.456 for the unadjusted production data in 
tons. 

2, The fluctuations of two time scries ma}" bo positively correlated, 
and the trends may be in the same direction. Correlating the data 

TAfU.K 22.4 

Correintion of l*rrcenl<tiivs of I'retid oj Proditviion of Siy-producl Coke and of 
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without Mdj(i-,tiMg for trciol, iri'.t(’a(! of c<)rr(‘laijTig perfauitages of trend, 
\\\\] in iuri fasHig th<‘ po.sitivi* corrc'lation (axtlicituit (If the por- 

^*eiif:igrs of tr(*u<l :diovved / --- f 1 0, ignoring the tnmds and corn'Lating 
th(* uiKidjustcd cfjuld not iX'Mill in a liigher value for ?\) Although 
th(‘ data cov(‘r .an extremely short period, the production of pig iron and 
th(‘ pr»Mlii(Uion (;f steel ingots and steel for eastings for 194(5 +4)52 will 
serv^' to illusfrat(' the principle invoiced. Table 22.5 shows the data, the 
i)chii\'ior of which may be .seen in ( lhart 22.6. (+iart 22.6 aKso shows the 
trends of th<‘ two .seri(;s. botfi of which are upward. It is apparent from 
lh(^ chart iliul- the flin tuation.s of the two .scri(\s about tlieir trends have a 
high po.si1iv(/ correlation. CV)rrc4ating, first, tin* unadjusted data, wc 
(ifid, in Tald(^ 22.5 tlial r ~ f 0.995. When the two series are each put 
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in terms of percentages of trend, the values are those shown in Table 22.6. 
This tabic shows, also, that correlating the per-ocnt-of-trend data yields 
r = +0.994. The per-cent-of-trend figures are so closely related that 
ignoring the trends could not increase the coefficient very much! 


TABLE 22.5 


Correlation of Production oj Pig frori and Production of Steel Ingots and 
Steel for Castings^ I9lf>-1952 


Year 

IMg iron 

(XUllion 

iSti'C'l ingots 
and .sire! 
ft‘r ca'^tinga 
V 

;» of toriii.) 

1 

,V}' 1 .V’ 

i 

- - . ~ I - 


1940 

45 fi 

00 0 

:j,0:iG IW i 2,07'l 30 

4 435 56 

1947 

59 8 

SI 9 

5,034 o7 j 3,510 40 

7,208 01 

1948 

01.0 ! 

88 0 i 

1 5,!04 liO 1 3,721 00 

7,849 90 

1949 

54 2 i 

78.0 1 

4,227 00 ! 2,y37 (U j 

1 6,084 00 

1950 

05 4 

90 8 ! 

0,330 72 i 4,277 JO 

1 9,370 24 

1 9o 1 

71 2 

; 105 2 

7,400 21 i 5,00!) 44 

; 11,007 04 

1952 

02 2 

! 9:i 2 

1 ,5 7<)7 04 1 3,,S(,S SI 

8.080 21 

Totftl 

4l '8 V i 

! ”013 ’ 

' 37,321 73 ' ' 25, 10!) !)3 

' "o'+Yor bo 


Dttta ftoii U, S. I tmcJit of Cotnni«T.‘o Off <*»• of H 

Riennml Kdition, pp. 108 and ioP. 


\2:xy - (Z.v)i:^r) 

\/! V :::a f 2 >'*' - “( 2 rr^i’ 

7(37,:^.2K73) - (418 l+.OI S.3) 

\/f7(2ri,4Glf9;b'~ (4 18>J)*][7 (7) 1.701.03) -- 
« 4-0 095. 

3. The fluctuations of (wo time .'^erios may be negatively correlated, 
but the trends may be in the same direction. Correlating the data with- 
out adjusting for trend, instead of correlating percentages of trend, will 
result in lowering the negative correlation coefficient or may even change 
it to a positive coefficient if the trends arc pronounced in relation to the 
fluctuations. 

4. The fluctuations of two time series may be negatively correlated and 
the trends may be in opposite directions. Correlating the data without 
adjusting for trend, instead of corrclaung percentages of trend, will result 
in increasing the negative correlation coefficient. (If the percentages of 
trend showed r = —l.O* igiiuriug the trends and correlating the unad- 
justed data could not result in a higher value for r.) 

If two time series are to bo correlated, and if both series have horizontal 
trends, it is, of course, not necessary to express the data as percentages of 
trend. However, if one of the two series has an upward or downward 
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trend, a suitable correlation of the fluctuations of the two series will not 
be obtained unless trend is eliminated from the scries showing trend. 

It occasionally happens that annual data for one series are regularl}'’ 
known, or made available, before the corresponding yearly figure for 
another, closely correlated scries. In such a situation, if the eorrelation 

M»L LIONS OF 
SHOflT TONS 

no 
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1946 »948 J949 1950 i95. 19! 

Chart 22.6. Proiluolion of Pijjj Iron an<l Procliirtioii <if St«Tl Ingots and 
St€‘el for ('.aslinf's, with Si raifrh l-IJiit- rrcinls, 1946 I9.'i2. D.'itii ot proclur’lion 
from 22.5. Tlif trends won* ro/nputod froni thf’S<' Ijj^un'.s, 

is high, a useful estimate may be made for the series which is not so 
promptly available. The procedure consists of: (1) (expressing the figure 
W’hich is first available as a perecaitage of the extended trend for that 
series, (2) estimating a per-eerit-'Cjf-treiid figure for the other series by use 
of an estirnati/ig e((uation obtained from a table like Table 22.4, and (3) 
converting this estimated per-(*ent-of-trcnd figure into the units in which 
the series is expn\ssed (tons, dollars, index numbers, and so on) by taking 
the estimated per cent of trend of the e.xtended trend value for that series. 
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We shall not give a numerical illustration of the foregoing, since most 
series, including by-product and Ix^chive coke, arc available on a monthly 
basis, and, when data arc already known for eleven months of a j^ear, an 
estimate of the annual total l(»r that series based only on the annual total 
for another series can be ol little use. It sliould be clear that the pro- 
cedure assumes a conrinualion of tlie relationship existing between the 
two sets of fluctuations, and also a contiiiuntion of the two trend lines. 

TABI.E 22.6 

Correlation of Perrentatie^ of Trettfl of Proflnrtion of Pifi Iron and Production 
of Steel Irifiots ami Steel f<ir CastingSy 1916- 1952 


Year 

i 

1 Pig iron 

i A- 

i.. _ 

i Met-1 inguth ^ 
and steel i 

1 for ca'^tinps : 

! A' : i 

XY 


I 


Y2 


1016“ ' 

1 S8 

5 

; tJU 2 1 

7.982 


7.8:;2 

25 

8,136 

04 

1947 

1 109 

2 

! 108 

1 1 . S26 


1 U.921 

(.1 

J I , 728 

89 

1948 

i KU) 

8 

j 106 7 

11,395 

56 

11 .406 

24 1 

' 11,384 

89 

1 049 

i 99 

6 

‘ 89 0 i 

8,0«i3, 

,40 

8,208 

36 ' 

! 7.921 

00 

I9o0 

1 lot 

0 

1 105 0 1 

10.972 

50 

10 920 

25 

! 11.025 

OU 

1951 

' nm 

f) 

i b)8 7 1 

11 .8.17 

43 

1 1 . 859 

21 

1 11.815 

69 

1052 

[ 91 

2 

* 9! 9 

8.381 

28 

S,317 

4t 

8,445 

G1 

Total 

1 ’ 699 

7 

1 6!»9 S i 

''70.459 

23 

'“' 70 '. 168 

. 39 ' ’ 

i 70.4.57 

12 ~ 


Thr nt ul t iM.rd from the pro»lm ti')n ol 'I'uolf 22.5, Mmna, the trenda 

rth<*v\n in Chait 22 (') 

^ ^ .A'^-vr - 

7(:0.‘i:>9.23> - 

\/[7i70, t6S.39) - (099.7)^117(70,157.12; 

- +0991 

CorreJatio)! of Jlui finifions: ivhoi (fnfa hart het v, divided hij s.. In Chapter 
16 it was pointed out that time si'ries having dilfereiit amplitudes of fluc- 
tuation are ('asiei- to con\])are graphically if each set of adjusted data is 
divided b\^ its standard (le\ iat ion. When two scries- of deviations have 
been expressed in terms of their respeetiv<‘ standard deviations, the 
product-moment formula for th(* correlation cocflicient becomes 

+£!/_ ^ A V (f . J/V 

A HxSy a \0‘jl Sy/ 

Thus wo obtaiii r by merely multiplying the paired values, (2) add- 
ing, and (3) dividing A\ (Note that ,s\v - Sz and Sy = Sy, since add- 

" Tlie series may be chronological or non-chronological For example, two sets of 
paired grades expressed as deviations from thoir means and in terms of their standard 
deviations (sometimes called standard ^corrs) may be eorrelated as shown in Table 22.7. 
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ing, or subtracting, a constant does not alter the value of s for a series 
of values.) The data of by-product coke and beehive coke produc- 
tion provide an excellent illustration, since it is apparent in Chart 22.4 
that the fluctuations in beehive coke production are more proirounced, 
in terms of percentages of trend, than the fluctuations in by-product 
coke production. In fact, {pr»ll of the 12 years shown in ('hart 22.4, 


FABLE 22.7 


Correlation of Percentage Deviations /rorn Trend Expressed in 'Penns of s 
for Hy-produrt Coke and Iteehivr* Coke^ 1911-1932 


Hy-piodurt coke 


Heoliivc coke 


Year 

'lT)Tr 

1942 

1943 

1944 

1945 

1946 
1047 
1948 
1940 

1950 

1951 

1952 
Total 


I 


~ 3 57 
4- I 76i 
4- 3.17 ; 
-f 7 55 1 
- 1 32 ; 
-lo 07 ! 
i -f 4 20 : 

I + 5 G4 I 
i — 7.65' 
j -f 1 GO i 
; -f 8.50! 
; - 4 91 1 


12 7119 
3 097G 
10 0489 
57 0025 I 
1 7424 : 
227.1049 : 
17 GlOO 
31 809G : 
oS 5225 ! 


Sx 

-0 541 




V 


X 


y 


8.02 i 


-hO 2GS 1 -4-lG 7G i 
-f-O 482 ' -t-15 23 ; 


-4-1 U8| -h 


- 0,201 
~2 202 
-fO G39 


2 85G1 i 4-0.257 
72.2500 ! -t-1 293 
21 1081 i -0.747 


4 35 I 
-19 53 i 
-27 23 I 
+ 10 07 
- 1- 12 00 ! 
-39 78 1 
+ G 55 i 
-f 39 43 I 
- 9 15 i 


64 3204 : -0 3S4 ; i 0 20774 4 
280 8976 ; -fO 803 : -i- 0.215201 
231 9529 i +0 730 1 4 0 3518G0 
18 9225 I 4-0 208 ' 4- 0 238784 
381 4200 1 -0.936 1 + 0 188136 
‘ - 1 . 305 


I 741 4729 
101 4049 
! 141 0000 

1 1.582 418 4 
i 42 9025 
I 1,554 7249 
i 83 7225 
: 5.228 1904 


H- 2 9910t;() 
+0 482 I 4- 0 307908 
40 S58| -4-12 00 ! 141 0000! +0 575, j- 0 493350 

-1 163' -39 781 1.582 4 484' -1 90G f 2 2JGG78 

f-O 314 I 4- 0 0801)98 
-f 1 8S!» 1 f 2 442477 
_ - 0 438 : 4 0 327181; 

' '518 9275 1 _ : 5.228 1904! ' ’ i f 10 061175 

The X and y values are the value.s in the last eoiurnn.'^ of Table 22.2 and Tables 22.3 
expressed as df'viations from 100.00. The snm of the percetita^'o tleviation.s from a 
trend line Is ordinarily not exaetit' zero Uouevep, if the tr(*nd has been fitted bv 
least square.s to data rovoriuK the .same period ns Die data viinhT eonHideration, the 
diaerepaney may he (‘xp(‘cted to he so .siiKht that it may he ignored. Iiududing the 

correetiori factors ^ and ^ below doe.s not alter the ligure.s in tht* third 

decimal place for and s,,. 


-/5i8 9L'7o 

Sx = \ y' " \ "p " G.5<6. 


518 927 


.. » v;?' . V” * 2o.«a, 


•1-0 838. 


the beehive coke per-cent-of-troiid value.s are farther removed from 
the 100 line than are the by-product values. Furthermore, six of 
the beehive coke fluctuations exceed the largest fluctuation shown 
by by-product coke. In Table 22.7 the two series are expressed as 
percentage deviations from trend, and the necessary computations 
are made for the determination of the standard deviations. Below 
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the tabic it is seen that the standard deviation for by-product 
coke, is 6.576, and that s„, the standard deviation for beehive coke, is 

X V 

20.873. Table 22.7 also shows the - and - - values. 'I’hese two sets of 

Sx Sy 

values are shown, as time series, in Chart 22.7, What has been accom- 
plished by dividing each series by its standard deviation may be seen by 
comparing C'harts 22.7 and 22.4. If a scatter plot were to be drawn of 

STANDARD 

DEVIATIONS 



(Jiarl 22.7, Prodiirlion of Ity-produrt (lokrarid of K\prcs»ecl 

IIS I'lTiM'iilapc I)e\ialioiiH frimi Trend and iti Terms of Their Standard Deviu- 
lioris, ion 1952. Data from Tahir 22 7. 


the and - values, i( would be rxarlly (hr .sfinic ai< Chart 22.0, except that 

the scales would differ. 'J'alde 22.7 shows the computation of r for the 
X y 

and ' values, and it is found to be +0.888, identical with the value 

Sx Sy 

obtained in 'bable 22.4. 

(Correlation of unadjusted data with time as a third variable. 

Another procedure for correlating the fluctuations of two time series con- 
sists of determining the partial correlation existing between the two series 
when time is held constant, i he partial correlation coefficient which is 
computed is r^.a, where Xi and Ah> are the tAvo time' series and repre- 
sents the years, which, for convenience, are taken with the origin in the 
middle of the period. 4'able 22.8 shows the sums neeeissary for deter- 
mining 7 * 12 , ri 3 , and r 23 and, from these, r^.^. Note that all of the totals 
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TABLE 22.8 


fJornputatiom for Partial and Multiple Correlation of Produrlton of Beehive 
Coke, Xi, Production of By-product Coke, A';, and Time, Xi, l9tt-19S2 

(Production figures are in thousands of sliurt tur.s ; 


Year 

Beehi vc 
A', 

1 By 
product 
A: 

1 

Tune 

A'l 

1 

A'uYt 

1 

1941 

6,704 

58,4k2 


392, 1)03 ~32S 

1942 

8 274 

62 2951 

- 9 

515.42S S30 

1943 

7 . 933 

63 . 7 43 

1 

505.673.219 

1944 

6,973 

07 . 0t>5 i 

- 5 

. 467. G44 24.5 

1945 

5,214 

1 62 . 094 * 

- 3 

323.768.110 

1946 

4.568 

53,929 

1 

1 246,317 672 

1947 

6,t>87 

06 759 

1 

: 446 .417. 433 

1948 

6.578 

08 284 

^ i 

449.172,16-9 

1949 

3.415 

60,222 

5 1 

205 05S.i30| 

1950 

6 . 827 

66. SOI 1 

7 > 

3i59 773.S57i 

1951 

7 . 343 

7 1 990 j 


! 528 . 622 . .‘>71)! 

1952 

i 4,G()1 

03 031 ' 


292.7t)6.231i 

Totil 

74.117 i 

765 . 385 1 

1 0 1 

i“703.Ti5V7'^'^’ 


Data from sources given below Tabic ?2 I 


A, 


A, A'l j 





a! 

73 

'711 

04 1 

.1021 

n 

9 13 

nUi 

3' 

4^) ,T44 '324 

- 74 

4 On 

5n0 

055' 

tiS 

159 

on. 

.3 

880.067,026 

-- 55 

.531 

* 4 10 

20 1! 

02 

932 

IK9 

4 

003.170,049 

-.14 

sr».5 

- 335 

325 i 

4S 

022 

729 

4 

497,714,225 

- 15 

•i42 

- ISO 

.'s-] 

27 

1H6 

7!9) 


8 >5 004.836 

- 4 

.)0s 

53 

V)29J 

20 

S06 

'12 1 


908.337 .041 

6 

0S7 

66 

T69! 

14 

716 

9t)9 

4! 

4 In. 704 081 

19. 

734! 

204 

85 1 1 

13 

270 

OSl 

4. 

Of >2 .701. G56 

17 

07.51 

301, 

im! 

1 \ , 

0<i2 

225 

3. 

0‘20,6H‘>.284 

40. 

,789i 

408 . 

237j 

35 , 

,953 

929 

! 4, 

174,405,881 

00 

087 j 

647 

OlOj 

5.1 , 

,919 

61')! 

1 5 

182,560. 100 

50. 

on 

<199 

Mill 

21 

If >9 

201 

i 4. 

048.001,161 

- o7 

83 V 

’’ro’.’v 

Tr>i isV 

701 

.3.S7I 

i49, 

077. 725.063 


2Xf =■ 2(286) = 572. 

__ 

Vix 2:xf- ( vj < - a 

12(t,7i;:),‘)25,7S.j) i71,117'l'76.5,:iS5) 

\/[l‘i(T8i,70r',;ls7) - (7-1,1 17)=|112i 19,077, 72.'., tUi.V) -- (76r),68:.i“-l 


4-0.156-128. 


- {::.Y,>(i.Y») 

ViivATvi ii:YT)^i 

12(-.57,8;W) - (71.1171(0) 

V!f2(48r70lX8b - (7-1,1 17;=|il2''572) (0^1 


-0.491.381 


VlYZYf— </2;A4)2)1 .v 2;A5- - (A.Y,/»j 

__ 12(16:5,1 1.5) - (7(M,:;.85;(Oi 

\/[T2('49, 077,725,663) - (765.:5K.5j ^li 1 2(572) - 


-1-0 123071. 


rii.t 


y 12 ■“ 7" 1 3^23 
^ a / ^ ^23 

-H) 450128 - (~ 0.40 mi) (0.42:^071) 
Vi ~ (o.iomiv \/i - (0.42:{()7i)2 


+0.845. 


shown in Table 22.8 could have been obtained from 'Fables 22.1, 22.2, 
and 22.3. From the computations showm below Table 22.8, we find 
ri2.3 - +0.845. 

If it were desired to expn\ss the relationship existing between the three 
variables by means of a multiple estimating equation, such as was used 
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in Chapter 21, and if beehive coke were the dependent^ variable .Yi, we 
would use the equation type 

Xel.2'i “ Ul.23 ^12.3-^ 2 + i>13.2A^3, 

where, as in Table 22.8, A "2 refers to the production of })y-produot coke 
and is time, with the origin for between 1916 and 1947 and the X^ 
units one year. If suc.h an e(}uation is used to estimate an annual figure 
for one series from a more promptly available figure for another scries, it 
assumes the continuation of the straight-line trends for both series and a 
continuation of the same relationship between the fluctuations of the two 
series. 

It is of more than passing interest that the partial and multiple correla- 
tion analysis set forth in 'J’able 22.8 is exactly the same as if we were to 
correlate the amounts of deviation from the trends in Tables 22.2 and 22.3. 
'fo demonstrate this, Table 22.9 ha.s been made which shows the absolute 
deviations from trend lor by-product cok(^ and for b(i(4)ive c()kc. Below' 
Table 22.9 it is seen that, wdien tlie absolute tie viat ions from trend are 
corrolatCvi, y ~ +0.845, the same value obtained for in Tabhi 22.8. 

Since the multiple and partial corn4a.tion procedure produces the same 
results as corndating absolute diflerences from tn'iul, the former pro- 
cedure is subject to tlie same disadvantage' as the latter. This dis- 
advantage wxis noted on pag(’s 365 3(>7, w'hore it w^as pointed out that 
relative deviations from trend are usually more meaningful than absolute 
deviations from trend, 'bhe fact that value of r obtained for the absolute 
deviations from trend is slightly larg(U’ than that for the percentages of 
trend should not be construed as an argument in favor of using absolute 
deviations from trend. One or a f"W’ large r.l- 'olute deviations w'ould 
have a marked effect on the value of r, as nobul ii. vdiapter J9 (see Charts 
19.9 and 19.10 and accompanying di.s(»ussion). 

Correlation of amounls of change or pcreenlagcs of change, 
0(‘casionally, the relationship between the fiucluation.s of tw o tinu^ series 
may be studied by computing the amount of (4iange from each year to the 
following year for both series and tlien correlating the paired amounts of 
change, wdiich wull have positive and negative values. This procedure is 
not recommended since: (1) using amounts of change results in the loss of 
one pair of values and (2) if the trend is non-linear, the first differences 
of values fluctuating around that trend will still contaui a trend element. 

® If by-product coke were the dependent variable, the equation would be 
Yo 2 18 ~ fl2 js 4" 6*1. »Ai T 6*3.1 A a, 

or the identification of vanables X\ and A'* could be interchanged and the equation 
given above could ho used, 
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This trend element could even be in the opposite direction to the original 
trend. 

Alternatively, percentages of change may be computed for each of the 
two scries and the paired percentages may bo correlated. Here again, we 
would have one fewer pair of values than the number of years involved. 
Also, the percentages of trend would still contain an element of trend if 
the trend for a series were not an exponential curve (page 291). 

Note that in both of these procedures difTcrent functions of the basic 
data than those previously discussed would he correlated. 

TABLE 22.9 

Correlnlion of Absohite Deviations from Trtmd of Prodnctiifn of Dy -proiltic t 
Coke and of Reehive Ctyke, 1911 1952 


(Thousaml# of shot t tuns ^ 


^'oar 

By- 

product 

V 

Beohivo 

1' 

A'r 


F* 

1941 

2Ti(w'':r 

- ’ssTTf 



i .2(>},Gr.5 18 

’47i79.S6(; 89 

341.757 16 

1942 

-+-1,079 4 

-fl.187.6 

1,281.8% 44 

1 ,1(55, 101 .3(5 

1.410,393.70 

1043 

■f 1,957 1 

-f-1 ,048 8 

2.052,606. 18 

3,830,210 41 

1,099,981 44 

1944 

4" 4 1 7 08 8 

-y 291 0 

i,;m),2()(» 80 

22.172.797 44 

84,081 00 

1945 

- 832.0 

- I ,265,7 

1.05:i,S21 82 

(593,22 2 7(5 

1 ,001 .990.49 

1946 

-9,507 9 1 

1 -- 1 , 709 5 

1 i().:i5f.,;52.-) (m 

! 91 .511,710 41 i 

2,922,390.25 

1047 

-f2,09l 8 

i 4 oil 7 

! 1 ,t)4(>,,')74 0(5 

1 7:245.787,24! 

374,170 89 

1948 

-+-3,010 4 

4- 704.9 

i 2,570,347 3(5 

) 13.296,23290 

49(5,884 01 

1949 

! -4,9S5 9 

-2,255 9 

:H.247.(5()I 8! 

! 24,859,198.81 

5,089,081 81 

1950 i 

-fl .112,8 

4- 358 3 

i 398, 7 1() 24 

1 1,238,323 84 

128,378 89 

1951 i 

1 4-5,641 4 : 

4-2,076 5 

1 11,714,3(57 10 

' 31.825,393 96 

4,311,852 25 

1952 1 

i 7:3.287^J1 i 

i “ '463 2 

1 ,522.955 28 ' 10.810.286 4! 

214,554.24 

Total 

; Y 0 . 1 

0 1 152.480,226 62 

i 21 3, 301, 1(5.5 49 

1 18,070,'i3l.'j9 


The deviations wore obtained fruju ihe f»r<>d’U'tioii and tiotid ciuta of 'i’ables 22,2 and 22. .3. 


x'^XY - 

12(52,480,220.02) - 0) I if - 0 1) __ 

” \/[12{2m,aoi, 105.49} - (0.1)^]112(i8,070,’iai.l9) - (0.1)"»Y 
« -4 0 845. 


Problems in eorredating lime scries. It must he evident that the 
value of the correlation eueflieient is afTected by the type of trend fitted to 
the data, and by the period to which it \h fitted. If a period of 10 years 
is being correlated, it would not be logical to use for one series a section 
of a trend fitted over a l00-y^•ar period and for the other a trend fitted to 
data extending over 10 years only. The former trend would, in all likeli- 
hood, fail to pass through the approximate center of each cycle, and might 
not even touch some of the cycles, ConseqiK'ntly, the correlation coeffi- 
cient might understate or overstate the degree of relationship between the 
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cycles of the two series. It must also be apparent that the use of an 
inflexible trend for one series and a flexible trend for the other would 
produce similar results. If we wish to correlate cyclical movements, it 
seems best therefore to use a trend that goes approximately through the 
center of each cycle. It may be that no simple mathematical curve will 
be satisfactory anrl that a relatively subjective method may have to be 
.-resorted to, at least as a first appn^ximation. 

Another problem to consider is whetlier the Pearsoniari metliod of cor- 
relation, based on the second moments, is appropriate for corndatiiig time 
series. The fluctuations of a time series are not usually distributed 
normally around the trend liric. There are sornetirru's a few extreme 
deviations, which, when s^piared, largely determine the value of r. With 
this problem in mind, some auiiiorif ies suggest tlie use of the rank method 
when the extreme deviations are particularly large .\nother solution is 
the use of a formula based un first moments, .ather than second.^ In 
view of the fact that interest frecjuently centers in whether two series are 
movin^r in the same gen(‘ral direction (positive or negative) at the sanu* 
time, witnout regard to the magnitude eitlier of their k^vel or of their 
change, it may be that a method applicable to 2 X 2 tables (see pages 
480 -482) would be approjiriate. 

A further 'iifli(*ulty in correlating time series is that we hav^e no logical 
basis for estimating the reliability of the coetrufient of ct^rrelation. The 
iTiief objection to the use of any reliability test for t for time serit‘s is that 
the different observations are not randomly distributed - ea(‘h o))servation 
in a time series is related to values in that series for preceding and subse- 
(pient points of time. Furthermore, we cannot ordinarily generalize 
concerning the exact nature of ih'z interreho i niship. Perhaps this 
difficulty will bofainie mure o))vious when we ask new many independent 
observations are cemtained in the eyelical relatives used in Table 22.7. 

^ See \'ali(Iity of (’ojrelafiou in Tiini and a Xrw ( ‘oedfR'ient of 

Similarity/’ by 0 tliossc'n.s and E. D. Muuzon, Jr., Journal of the AmencaJi Stah'sUcal 
Association, Vol. XXH, IVcrmbor 1927, pj). 4Sa -192. 4'his iiu thod’i.s furlhor ♦duoi- 
datod and its rolaliori to 7 cxplaiinMl by R Davios. in an articlo ontitiod “First 

Moinont (\)rrolation/' appearing in tlio Joufnal the 4 itan Staiisiical As^iociation, 
Vol. XXV, l)ccoml)or L9d(\ pp 4 IS 127. The* loriunla is 

^ i’s(2,V - L'Mi 

= - > 

where s refers to the snuilkn* of earh pair of items when eae]\ series is expressed as 

/ j" y \ 

deviations from the mean in terms of average deviations y and J- When 

summing algebraically, s is positive if the signs of the paired deviations are alike, and 
negative if they are unlike. 
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Although there are 12 years, there are not 12 independent observations 
There are only three complete cycles (measuring from trough to trough). 
Are there, then, only three independent observations? There arc more 
than three, since each observation in a cy(*le is not comphdely dependent 
on the prei eding values. If we now had monthly data, would we have 
144 independent observations for the 12 years? Of course not. Hut 
how many we wt)uld have, it is impossible to say. What has just been 
said ma}'' be clearer when the reader understands the concept of “degrees 
of freedom/^ This is discussed in Chapter 24 and again, with particular 
•^ference to con elat ion, in Chapter 20. 

-^-11 of the pre(a?ding illustrations have dealt with chronological series 
expressed in physical terms. None were in monetary units. Wlien a 
series in terms of dollars, it should ordinarily be adjusted for price 
changes by dividing l;y an appropriate price index. Sindi a situation is 
encr^hutered when we examine tlie relationship existing between the price 
^*Vd production of an agricultural crop such as oats, hay, wheat, or citrus 
^iruits. The correlation present may he between priee and produetion 
for the same years or between price for (‘ach year and production for the 
following year. 

The foregoing discussion has dcailt only with combat ion ot two time 
series, although it was uunitioned at the outset that, we might (‘orrelate 
two or mor(' time .'^eries. If one' is inuhn taking to explain, statistically, 
the annual fluctuations in the price' of pork, he would uiidoul)t(‘dly bring 
into his analysis not only tlu' production of pork, but the pries' and pro- 
duction ot corn, and probably the price and prodii(‘tion of b(*(‘f ami other 
meats. A problem of this type is more eomplieateri than those wliieh 
we have. considered lu re, sines' multiple eons'lation of seveual variables is 
involvss] However, the pre)esslures are (‘xae*tly tliose' set forth for multi- 
ple* ami jiartial (snre'ial ie)n in (4ia])t('r 21. Wliale'vs'r the* iiumbe'i’ of 
varial)les be ing considene'd, a}>j)roj)riat(‘ aeljnstment must be* made for the 
trend of eaedi serii's. 

MONTHLY DATA 

When corredatiug monthly time series, it is necessary, not only to 
adjust for tremd, but to dcseasonalize the data as w(‘ll. If tlu^ data were 
not eleseasonarize(l,Ave would be, to a large e\le;nt, merely correlating the 
se'asunal fluctuations instead of the cyclical move'inents. In addition, 
it is also usually desirable to smooth the adjusted data by means of a 
short-term moving ave'rage (as explained in Chapter 16) in order to 
remove! the irregularities due to accidental movements. 

Synchronous relalionships. Sometimes one is interested in cor- 
relating two monthly time* series in orde'r to ascertain whether the two 
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move together. Thus, such a correlation might be made if two organiza- 
tions issue indexes purporting to measure the same asp(H;t of economic 
activity. Or, a research bureau may be interested in knowing whether 
an index of business conditions, computed upon the basis of a few com- 
ponent series, agrees closely enough in its depic^ting of cyclical movements 
with a more comprehensive index whith is also more expensive to con- 
- struct. Again, one may be interested in comparing time scries (for 
example, department store sales) for two, or more, of the twelve Federal 
Reserve districts. 

Lag ami lead. Fn^quently one is interested in finding a monthly 
time series which moves ahead of a second series anti which may therefore 
be used to forecast the second series. The relationship which one liopes 


PER CENT 



Chart 22.8. Two Illustrative Scries Showing One Scries Hegiilarly 
Prce<*<liiig the Other. 

to find is something like the ideal one illustrated in (’haii 22 S, nlthoegli 
the cycles would almost never have the regulari* ^ ohown in this chart. 
In Chart 22.8, the fore<’asting index is seen to move, regularly, ahead of 
the series to be forecasted. When such a situation obtains, the earlier 
moving scries (that is, the fonatasting index) is said to “lead’' th(' otlier 
series. Also, the later-moving seiif\s is said to “lag” the (‘arlier-inoving 
series. One will very rarely find a lag-lead relationship as uniform as that 
depicted in Chart 22.8. In fact, since 1941 , lagging relationships between 
economic time series have not been at all clear-cut, owing first to World 
War II and then to the Korean War and to deb'nse prodindion. How- 
ever, some lagging relationships do appear, as indicated by th(‘ following 
statement from the Thirty-Third nuial Heport of the National Bureau 
of Ecamomic Research “Recently, we have laid plans for exploring how 
one of the firmest and most important of the Bureau’s findings about the 

® A, F. Bums, liiitsinois (^yilr Rrsiurch antf the of Our Tntus, .\nnual 

Report of the Naliuual Burciiii of Beononik* Research, liie , Now York, 1053, p. 12. 
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business cycle might be put to current use; namely, that the cycle in 
aggregate activity has been invariabl}'* preceded by a remarkably regular 
cycle in the proportion of individual activities undergoing expansion/' 

Chart 22.9 shows the Federal Reserve Index of Production of Durable 
Manufactures and the Federal Reserve Index of Nondurable Manu- 
factures for the period January 1946- December 1953. These indexes 
were adjusted for seasonal movements by the Federal Reserve Board.- 
The writers removed trend and smoothed the accidental movements by 
means of a three-month moving average weighted 1,2,1- The acJAial 

CCMT 



('hart 22.9. ’'C>oIical Mo>i'nifnls of IVih^ral of PrtMlut'tioii of 

Ihirablo. .Maiiufurturcs and of Index of Proflijctiini Momlurablc Matiiifao- 
lures, 191-6 -I95.'i. Data from Tfibif* 22 10 and from \vorksb('ots (not hliowii) for the 
yoars oinitlf'd from th.>t tahl*\ Both mdoxo.s wfro fuljn.sUMi for tn^inl and for BCa- 
,-onai and i.'in^^ulfir inov'omofits, and wore o\pros.s*al aw pon'rntago deviations. 

situation depicted in (3)art 22.9 i.s much dilTcnuit from the illustrative 
one shown in C-hrjrt 22.8, wh(?re one scries regularly pret^eded the other. 
Examination of C’hart 22.9 reveals several interesting points: the low 
points in the Index of Nondurable Manufactures appear to precede 
similar low points in the Index of Durable Manufactures in 1947, 1948, 

1949, and 1952; the high point in the Index of Nondurable Manufactures 
in 19b) scorns to precede by some months a high in the other index; in 

1950, the higli point for the Index of Durable Manufactures precedes the 
high for the Index of Nondurable Manufactures. 

In g(meral, the Index of Nondurable Manufactures seems to precede 
the other index. Wc shall compute several correlation coefficients to 
ascertain when the (4osest agreement is present. First, correlating the 


Chap. 22] 


CORRELATION OF TIME SERIES 


581 


two series synchronously, we find r = +0.682. Next, pairing the values 
with the Index of Nondurable Manufactures leading the Index of 
Durable Manufactures by one month, we obtain r ~ +0.715. (Hero 
the pairing starts out with January 1946 for the Index of Nondurable 
Manufactures paired with February 1916 for the Index of Durable Manu- 
factures and finishes with November 1953 for the leading series paired 
•with December 1953 for the lagging series.) Since the lag between the 
two series is none too clear in Chart 22.0, we try a pairing with the Index 
of Durable Manufactures hading by one month. This yields r = 
+0.637, which is lower than the value first obtained, so we will not pursue 
the illustration further in this direction. 

Trying, now, two months’ lead for the Index of Nondurable Manu- 
factures, the computations for which are indicated in Table 22.10, we 
obtain r --= +0 728, which is larger than the coefficient for a one-month 
lead of that inde.v. (Chart 22.10 shows the t\NO indexes, with the Index 
of Durable Manufactures mov'^ed two months to the left.) Next, we 
conn) Die correlation coefficient with the Index of Nondurable Manu- 
factures leading months, and get r - +0.6SS, which is smaller than 
the value just obtained for t.vo months’ lead. Little is to he gained hy 
computing additional values of r for the* purpose.^ of this illustration, so 
we will s\Hnmari/i(5 the results, as follows: 


Leading i^frtes Valuf of r 

Index of nurjiMo Mjinufact^ncH loads by. 

one inontli ... . , -f 0 037 

830 ichron(*u.s .... -f-O 082 

Indov of Nondnrable iManntaoturos loads by: 

Olio nionih 715 

3'w(; iiioiitlis . . f 0 728 

Throo Tiiontljs . . ... -f-0 tiSS 


The highest correlation coefficient wa.s found when the Index of Non- 
durable i\Ianufaetures led by two months. However, that index would 
not serve as a very satisfactory foreeasting series for the liido of Durable 
Manufactures, because the valm’ of r does not indicate close enough 
agreement. 

It is not always necessary for one time series to lead another one in 
order for it to be useful as an indicator of the behavior of the second scries. 
The Bureau of Business and Economic Research the University of 
Maryland reports® that Baltimoi bank debits are correlated +0.9998 
with Maryland bank debits, and that Mar^dand bank debits are correlated 

® University of Maryland, Bureau of Business and Economic Research, Studies 
in Business and Economics, Vol. 6, No. 3, December 1952, “Maryland Economic 
Indices/’ p. 10. 
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-+-0.9853 with bank debits in the United States. The Bureau notes that 
“turns in direction of the Baltimore scries may be expected to indicate 
turns in the State and the Nation.” The usefulness of this relationship 
lies in the fact that data for Baltimore would be available more promptly 
than are data for Maryland or for the United States. 

TABLE 22.10 

Defer miriation of Correlation Uetteecn Federal Reserve Inriex of iSiontluruhle 
Manufactures and index of Durable Manufactures, January I946r- 
Decemhvr 195, i, with the Index of JSondurahle Manufactures Lead- 
ing by Two Months 

^tsotn indoxfa nave i'.m/ hi**'.# as tne nase. aioaajMaieu lor aoasoiiBi, irenu, anu irri'K'i'ai wiuviMuruvn, 
an»l art- rxprci>s<M! as percentaRO deviations.) 


Year 

and 

month 

IikIox of 
Nondurable 
Manutaoturos 
A 

Indication 

< 

pairing i 

1940: Jan, 
Fob. 
Mar. 
Apr. 
^^ay 
Juno 
July 
Aue:. 
Sept. 
Oct. 
Xov. 

I >00. 

-fO 1 

4-1 0 

4' 1 b 
4-0.3 
-0 8 
-2 2 
-3 0 
-2 0 
-0 7 
-fO 8 

4-2 6 
: . -r3 1 

1 

Lr: 

1 


.hily 

+2 0 


Amr. 

-f 0 4 

! 

Sopt . 

-1 1 

1 

Oct. 

-2 2 


Nov. 

-3 1) 

i i- 

Deo. 


> 

Total 

3 3 



Index of 
Durable 
Manu- 
factures 
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25 
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1 
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0 
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2 

4 


0 

9(7 

4 
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5 

7b 

— 

5 

1 
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29 

lb 
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8 

4 

4- 

IS 

481 


i 

70 

50 


48" 

4 

i-i-l. 

534 

4G|9 

72 

87ji 

,707" 

88 


ne.seaHonah/«’d data from Fr<Ufal Re'urvr UulUUn, De<.oni)»f‘r lOSU. pp. 1.JJ0- and nmneoisruphfd 
reicascs 


(XXnZY) 
v/lA'2A'’ - (2;.v)*ii.v :£)■* - 

940. .55.1. 40) - (.'(.:i)(18.4) 
Vl!H(97’2.‘87)‘ - (3.8)*li94(.i,707.88y 


(18.4)»1 


+ 0 . 728 . 


Proceiliire for use of lead and lag as an aid in forecasting. If it 

j.s desired to make use of a lead-lag relationship to assi.st in forecasting the 
cyclical movements of a scries (the lagging series), the procedure may be 
as follows: 
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1. Plot the lap^ging scries on a large sheet of semi-transparent cross- 
section paper. The exploratory work in tliis and the following tliree 
steps may be done with data adjusted for seasonal. Trend (unless it is 
very marked) and irregular movements need not be removed, although it 
is belter if they have been eliminated. 

2. Consider what series may logically be expected to precede the lag- 
ging series, and plot each of these series on a separate sheet of semi- 
transparent graph paper. The horizontal scales used in iSteps 1 and 2 

P£R CENT 



i^hart 22.10. (helical Mo\eint-iils of tVderul Keserve IikJox of ProUiiclioii 
of Durable IVlaniifaclurcM anil of Index of rrocliict ion of NoinhirabU? Alanii- 
fnetnres, 1946 19.>,’1, with Index of Diiruble IVIanufactiires iXIoved 'I’wo Months 

to the Loft. Data from Tal»l<' 22. U) and from work.sl'^'cts (not shown) for the years 
onntted from th:it table. Doth senes were adjihsted ‘''.r trend and for seasonal and 
irregular movements, and were e\pres.sed as pereentag ' vleviatioris. 

must be the same. The vertical scale.s rray be adjusted so that the 
fluctuations of the scries which are to be compared are roughly the same. 

3. Place the chart of one of the presumably leading series on top of the 
chart for the lagging series (or vice versa), place both above a source of 
light, and move the chart of the lagging .'^ericvS fo the left until the closest 
agreement between the cyclical movements of the two series is obtained. 
Chart 22.10 shows how this might appear. If closer agreement is 
obtained by moving the leading series to the left, then it doesn’t lead- - 
it lags! 

4. Repeat Step 3 for any other scries which might move ahead of the 
scries for which forecasts arc desired. 

5. When a .series has been found that appears regularly to precede the 
lagging series, adjust both scries for trend and irregular movements and 
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compute the value of r for the best visual estimate of the lead shown by 
the graphs of these adjusted series. 

6. Compute the values of r for longer and shorter leads than that used 
in Step 5 in order to arrive at the highest value of r. This was two 
months in the preceding illustration. 

7, If the value of r is high enough to warrant doing so, an estimating 
equation of the type 

Tc — a hX, 

or possibly a non-linear equation, may be computed. Here, is the 
estimated cyclical value for the lagging scries and A" is the observed 
cyclical value of the leading series. If the probing in Steps 3 and 4 should 
reveal more than one leading series, a forecasting ecjuation such as those 
for multiple correlation (Chapter 21) would he used, 

One investment advisory service^ has been using multiple correlation, 
with one independent variable leading by a year, to obtain a rating for 
stocks. In this analysis, the dependent variable is av(U*ag(' annual 
price of a stock, while the independent variables are: annual dividends 
per share, annual earnings per share, the averagti monthly" price of the 
stock for the preceding year, a measure of market climate^’ or senriment, 
and time. Market climate itself is obtained by a process of multiple 
(’orrelation and represc'iits the diff^'rence, over a long period of time, 
between a composite stock price average and estimates of that average 
based on earnings, dividends, and time. 

The slowness with which most e^amomic and business data are n^ported 
and the scarcity of time series on a basis shorter than a mouth are factors 
that impair the usefulness of correlation as a forecasting device. It is 
quite possible that weekly, daily, or hourly data might bring to light 
relationships which arc knowui and utilized only by a few “insiders.’’ 
The theorist argues that all economic processes ar(‘ iuterr(4ated. It does 
not seem logical that the cau.se-and-effect relationships which supposedly 
surround us on every side must always take a month or more for their 
development. There must he many that work out in a few days, a few 
hours, or nearly in.stantaneously. If the market hears that a new indus- 
trial use has suddenly been announced for copper, it does not wait weeks 
or even ho\irs to show its reaction in a price ehang(‘. As data arc made 
availalile upon a weekly, daily, or more fr(‘(|uent basis, it is conceiv^able 
that very ij.S(;fuJ lag-lead relationships may he obtained. 

Home cautions. It may have been noticed that the heading of the 
pre<!eding section referred to the use of lead and lag as an aid in fore- 

^ The Value Lin^' Investment .Surv«'V, 5 44th Street, New Yoik City, 
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casting. A leading correlation that has been observed for a number of 
years in the past will not be applicable to future months unless the rela- 
tionship between the series continues as before. If underlying economic 
(or other) conditions change, the relationship may be altered. Fore- 
casting by this, or any other, device should be attempted only in connec- 
tion with a thorougli knowledge of the series under consideration and of 
^he conditions affecting those and allied series. 

The use of lead-lag correlations in forecasting is alvso subject to other 
objections or shortcomings, \mong these are: 

1. As pointed out in C’hapter 19, the value of r may be unduly influ- 
enced by one or a few extreme values. Some statisticians even argue 
that one^s visual impression of the amount of load is preferable. 

2. The lag may be different at recession from what it is at revival. 

3. Interest often ( enters maiidy on turning ooints, while r gives equal 
importance to leads and lags at all phases oi the cycle. It may be 
profitable to be able to foretell merely when to expect a I'hange in direc- 
tior, c.c*. tliuugh the amount of change cannot be forecast. 

4. It is a laborious process to compute r for a large number of lead-lag 
hypothcvscs. 

5. In addition io criticisms of the coefficient of corrolaiion as a measiire 
of relationship for time series, one may also criticize the nature of the 
variations (jorrolatod, arguing that a ptu'so]) can more ac^eurately predict 
the future with respect to the pr(*sent than he can with respect to some 
normal, which is often difficult to estimate correctly. 

In Chapter 2b, attention will be given to the reliability of correlation 
coefficients computed from rand* m samples. Since the coefficients 
obtained from lead-lag relationships are not ft* random samples, the 
procedures in (^hapter 26 are not applicable to the corn'lation coefficients 
for leading and lagging senes. 
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A : when tossing a die, tho orr*nrrenco of a whitt* side. . t lias no nuTncrlcaJ 
value. 

0 ^ 3 : lower-case Greek alpha, a measure of skewness, \/]8i. See Chapter 

10 . 

B: when tossing a die, the non-occurrence of a white side. B has no 
numerical value. 

^ 2 '. lower-('aso Greek beta; respectively, measures of skewness and 
kurtosis. See Ch.apter 10, 

c\ a correction for skewness sometimes used in fitting a logai'ithmii* normal 
curve. 

Co, Cl, C 2 , • * : the binomial coefficient.s. 

(V \ deviation, in terms of class intervals, of an A' walue from A'^. 


e\ 2.71828; the limit of tin* si'ries 1 -f- I + f f * 

/: a frequency. 



areas of Appendix K. , 


in fitting the seeond -Hp]>ro\imation curve, the tabl(*d values of 

Appmidix F which, wlnm multij)li(‘d by give the inodifleation for 
skewm'ss. 

h: in eoin tossing, the of'curnmct' of a Inaid. 
i: the class interval. 
k: the number of vsampl(*.s. 

N: the number of items in a sample. 

i'l, i's: lower-ca.se Greek nii ; the? first, seeond, and tliird moments about 
a select(‘d origin. See Chapter 10. 

: the proportion of occurrema^s in a sample. 

: lower-case Greek pi, in tlie expression for tlie normal curve; tlie con^ 
stant 3.14150; in the binomial, the proportion of occurrences in a 
population. 

7 r 2 . TTa: low^^r-case rire<'k pi; the scecaid and third movements about X. 
See Chapter 10. 

q: the proportion of non-oceurnmeos in a sample. 

586 
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Q: the quartile deviation or somi-interquartilc range. See Chapter 10. 

Qif Qij Qs- Iho quartiles. See Chapter 9. 

s: the standard deviation of a sample. See Chapter 10. 

Si„b: the standard deviation of the logarithms of a series of sample values. 
Ski.^g! a eoefhfient of skewnes.s based on the logarithms of the qiiartiles. 
a: lower-case Greek sigma. Tlio standard deviation of a population, 
the estimated standard deviation of a population, computed from a 
single sample. Refern'd to as ^‘sigma (^aret’^ or '‘sigma hat.^* See 
Chapter 21. 

i: in coin tossing, the o^’currence of a tail or the non-occiirrence of a head, 
r: loA\er-case Greek tan; tlic proportion of non-occurrences in a popula- 
tion. 

.r: X - X. 

A’ : a value of the AVseries. 

A": the arithmetic mean. See (^hapter 9. 

Xd- a de.signated m(‘an. See ('hapter 0 

♦^he arithmetic mean of a .series of logarithms, 
xio^: log A - Xi.g. 

y\: a computed ordinate of a fitted (uirve. 

To: the computed ordinate of tlu' normal lairva; at A^ 
jlj{x)dx: proportionate area under a curve from A" to X. 



CHAPTER 23 


Describing a Frequency Distribution 
by a Fitted Curve 


A frequency distribution usually represents a sample drawn from a 
mucli larger population or universe. Kven though a sanqih^ is composed 
of but a few hundred or a few s<*ore items, it may l*e reasonably repre- 
sentative of the larger universe from which it was drawn. Since it is 
virtually never possible to measure all of the individuals or items com- 
prising a univ'ersc, we must form our notion of th(' larger group from a 
study of a sample. We may tlierefore tit any one of number of types of 
curves to a frequency distribution in order to altiunpt to desiTibe what 
appears to be the general form of the curve for the entire i)opalation. 

The purpose in fitting a curve to a frequency^ distribution may bo any 
one of the following: 

(1) We may wish to ascertain whethc^r a given curve describe.s the 
general shape of the distribution. For examj)le, we may wish to demon- 
strate that [the chance errors involved when making re])eated measure- 
ments of the same object or phenomenon may be (iescrib(id by a normal 
curvej Chart 23.1 is a normal curve and ('hart 23.2 shows such a curve 
fitted to a series of repeated measurements. 

(2) Somewhat similar to the foregoing is the fitting of a curve to values 
obtained from repeated samples taken from tlie same population. An 
illustration of this is included as Exercises and XVl in the third 
edition of the Workbook*^ de.signed to accompany this text. In those 
exercises, a noimal curve is fitted to a frequency distribution of arithmetic 
means computed from random .samples. While sample arithmetic means 
tend to form a normal curve around the arithmetic mean of the popula- 
tion, other statistical values may form other types of curves. Further 

* F. E. Croxton, Wmbhovk in Applied General Tliirtl Edition, Preiitice- 

Hall, Inc., New York, 1950. 
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consideration will be given to the behavior of values computed from 
samples in Chapters 24, 25, and 26. 

(3) It may be desired to generalize concerning the proportions of items 
which should be expected to fall above, below, or between certain values. 
For example, we may tak(* the case of fitting a curve to a frequency dis- 
tribution of the length of life of incande*scent lamp bulbs; from such a 
-‘procedure we are enabled to infer what proportion might, in general, be 
expected to burn 1,500 hour.^ or more (or more or less than any specified 



Cli«rt 23.1. The Normal Curve. 


number of hours). Similarly, in the case of "ce data shown in Charts 
23.5 and 23.6, wc may determine the number of items which in general 
would be expected to occur above, below, o; between any two X values. 
In like fashion, the life insurance actuary may fit a curve to, or graduate 
data having to do with, deaths cla.ssified by age and thus determine the 
expected number of individuals dying during each year of life or surviving 
given ages. 

(4) Sometimes it is possible to determine, from a curve fitted to a given 
distribution, the probable distribution of values in a closely associated 
series. For example, normal curve fitted to the measurements of the 
circumferences of men’s necks en bles us to ascertain the probable number 
of collars of each size which would be neededj This has been done in 
Chart 23.8 and Table 23.5. 

This chapter will not attempt a comprehensive treatment of the topic 
of fitting frequency curves. We shall consider only the symmetrical 
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curve known as the normal and then, briefly, binomials and two 
of the simpler skewed curves. 

THE NORMAL CURVE 

Development of the normal curve. The concept of the normal 
curve (pictured in Chart 23.1) appears to hav(‘ been originally developed 
by Abraham De Moivre and explained in 1733 in a mathematical treatise^ 

NUMBER OF 
MEASUREMENTS 



ei79i 

length !N feet 

Chart 23.2. Normal (^iir\e Titletl to 111 MruHiimneiils of the 
I^ii^th of a l.inr. M<*;i.san'inrntM fiom L. IJ. Wclii, Thory of Krrois and 
Leant Sijuaren, ]). I 17, The Miirnnllari Conipuiiy, X(nv ^'ork, 1010. 

which its author believed had no practical appli^-ations other than as a 
solution of probhnn.s encountered in game.s of chance, jjdauss later used 
the curve to de.scribe the theory of accidtaital errors of measurements 
involved in the cah'ulation of orbits of h(‘avenly bodies!] Because of 
Gauss' work, this curve is somedimes referred to as the (laus&ian curve. 
Chart 23.2 shows a column diagram of 144 measurements of a line* and 

^ ApproxiuuUio ad Sninniam Terminornm liinonu't (a 4- in Seriein. expami^ Nov. 
12, 1733, being a second siippleiucnt to iMisrtllnnra .\rtal\jlica, 1730. Sec Karl Pear- 
son, Uinttmeal Note on the Origtn of the Normal Curve of Errorn, Biometrika, Vol. 10 
(1924), pp. 402' 401; also, Helen M. Walker, Studies in the History of Statistical Method, 
pp. 13-17, 22-23, Williains and Wilkins, Haltiinore, 1029. 

* The 144 measurements arc from b. D. Weld, Theory of Kirors and Least Squares, 
p. 147. The Macmillan Company, New York, 1010. 
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a normal curve of error fitted to these measurements. Conccrninji; the 
normal curve, it will bo observed: (1) that small errors are more frequent 
than large ones, (2) that very large errors arc unlikel}" to oc{uir, and (3) 
that positive and negative errors of the same numeri(‘al magnitude are 



E F 

Cihnrt 23.3. Appriral us lo Mlu*«lrutc ihr EI\[>aiision of the Binomiul 

(2 ‘i- >). 


equally likely to occur. Because the tiormal curv(' has Ixvn used ('\U‘n- 
sivoly to describe (‘rrors of nieasurement, it is soTv^'tlines n'fei red to as t he 
‘^normal curve of error.’ However, this term is misleading, since errors 
of measurement, even though unliiased, do not always follow the normal 
curve.® 

Explanation of I he fonniil. Chart 23.3 pii-i>ures an apparatus 
which will help us to uiidm’stand the formula for the normal curve. The 
device consists of a number of troughs, open at one end and placed as 

3 Soo N. R. (^arnpbcll. da Arayunt of (hr Prinnplefi of }frn}iur€fn€ril atni Calculation^ 
Ch. IX, especially p. 182, note 1, Longmans, Green & Co., London. 1928. 
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OCCURRCNCe* 
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Chari 23. lA. Expected ReAultn 
of 10,000 Tosses of Four ('oins. 


shown in section A of Chart 23.3. Trough d is filled with sand or some 
similar granular substance. If the apparatus is tipped so that the left- 

hand side rises (section B of Chart 
23.3), the sand in trough d will flow 
into trough j and i into trough k. This 
represents the binomial + i) . If the 
right-hand side of the machine is tflen 
raised (section (- of Chart 23.3), the 
sand from j will flow J into c and ^ into 
(I, while tlie sand from k will flow I 
into d and ^ into e. Of the total 
amount of sand, we now have t in c, 4 
in rf, and i in c, representing the expan- 
sion of the binomial (4 + 4)^- Again 
tipping the device, as in section 1) of 
Chart 23.3, 4 ^^f the sand from c flows 
into i, and i intoj ; 2 of the sand from d 
flows into ji, and i i^d<> ^ ‘^i^d 2 of the 

sand from e flows into A*, and 4 iuto L Tlie re.sult is that 4 of all tln^ sand 
is in t, f is in | is in A:, and i is in /, representing the expansion of the 
binomial + 4)’^ Tipphig th<‘ apparatus as in .se(dion K of Chart 23.3 
causes the sand to flow iV ioto inntNct% 
bf into Cf -ft into d, into c, 
and rV into /, representing the 
expansion of (i + '()nce 
more tipping* the rnaehirie (sec- 
tion F of C'hart 23.3) results in 
putting ^ of the sand into h, 
into f, 4^ into j, into A:, 
into Ij and into m, which is 
the expansion of (-4 + i) ^ 

While the above illustration 
is instructive and gives ns a 
picture of the expanded bino- 
mial, the device would become 
clumsy if we attempted to carry 
the expansion of the binomial 
much farther. We may obtain 

similar results by ^tossing coins! Chart 23.4B. Expectecl l< 

—a procedure which eliminates 
the necessity of constructing any appanit\»s. It is assumed that we are 
tossing perfect coins which are evenly balanc.'cd and which will not stand 



4 - 

^ 64' 


HcAultA of 10,000 
loins. 
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occurrcnccs 


on edge. With such a coin, the chances of throwing a tail or a head are 
identical and may be expressed by 

If two coins are tossed simultaneously, we may obtain either no heads 
(two tails), a tail and a head, or two heads. In order for no heads to 
appear, both coins must fall tails up. To obtain one head, one coin may 
show a tail and the other a head, or the first coin may show a head, the 
a tail Two heads may appear only if both coins show heads. 
Since one head may occur in two ways, w'hile no heads may occur in but 
one way, it follows that there 
is twice as great a proba- 
bility of throwing one head 
aJ3 of throwing no heads. 

Similarly, there is twice as 
great a chance of throwing 
one head as there i.s of throw- 
ing two heads. Wc may ex- 
presr +he prohabiliticws aris- 
ing from tossing two coins- 
hy 

(il +- hh}\ 
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in which the (exponent 2 
indicates the number of coins 
being tossed. Expanding 
this binomial gives 

“h “b 

Therefore, if two perfect 
coins are thrown 1,200 times, we coiiUi expect t( obtain (no lieads) 300 
times, th (one head) GOO times, and //’ (two heads) 300 times. 

If three coins are tossed, we have the expression 


Chart 23 . 4 C. Expected Hesulls of 10,000 
Tosses of Ten Coins. T}je probability of each 
combination is indicated by the binomial expan- 
sioT‘ dmwn unde' . part of Chart 23 . 4 . 


at + hhy kt^ -f m + + w, 


indicating that, if 1,200 throws w’ere made, there should be no heads 150 
times, one head 450 times, two heads 450 times, and three heads 150 times. 

The results to be expected from tossing 4 coins areshown in section A of 
Chart 23.4, while the results to be expected from tossing 6 and 10 coins are 
shown, respectively, in parts B a ? C. All of these curves are symmetri- 
cal, and, as the number of coins tossed becomes greater, the curve becomes 
smoother. When ten coins are tossed, there are eleven points to be 
plotted (see part C); but if 100 cmns were tossed, there would be 101 
points to plot and the curve would appear virtually the same as that of 
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Chart 23.1. In fact, it can be shown* that, as N api)roachcs infinity, 
(if + ^hy approaches as a limit 




1 2a* 

(T V 27r 


which is the expression for the normal curve. The symbols are as follows: 

Yc = the computed height of an ordinate at distance x from the 
arithmetic mean; 

<r the standard deviation of the population; 

TT ~ the conttunt, 3.11159; V 27r == 2.500(>; 

e — th** constant, 2.71828, the base of iho Napeiian system of 
logarithms; and 

X = a selected deviation from the arithmetic mean. 


Substituting the two coT).stants mentioned above, we may write the 
equation 


F. 


_ 1 


2.71828 


- 

2a* 


FITUNG THE NORMAL Cl KVK 

In Chart 23.2 a norma*- curve was shown fitted to a series of measure- 
ments of a line. It will l>e ob.servcd that those ti|ruies were repeated 
nicasu remen ts of the same thing. In t3iart 23 5 we have a diffenmt type 
of data, repres(‘nting measurements of a number of individuals from a 
homogeneous population. The chance errors involved in repeated 
mea.surements of the same thing not infrequently follow a normal curve. 
However, the measurements of a number of differential individuals in 
respect to some < haracteri.stie may or may not follow such a curve. A 
distribution of the heights of a h(miogeneous group of adult individuals, 
for example, could be exp(‘cled to be e.s.senlially normal, but a distribution 
of the weights of tlie .same individuals would be noticeably skew(*d to the 
right. While the basal diameter of tin* egg-ca[)suh‘s of tbe snails in Chart 
23.5 may be described by the fitted normal curve, it i.s quite likely that 
the weights of these same egg.s would show didinite skewness. 

The fitted curve in Chart 23,5 indicates th(* shape of the distribution 
we should exficct if our sample were much larger, or if we had measured 

* See G. U. Yule and M. G. Kendall, An Introdurdon to the Theory of Stalistioi, 
ITafner Co,, New York, 1050, pp. 177- 181. 

.Another limit of th/* binomial i« the Poisson distribution, which the binomial 
approaches if one of the fractions Ls very small and N approaehe.s infinity. Fitting 
the Poisson distribution is deHcrib«Hl in F. E. Ooxton, Elementary Statiafic.'^ with 
Applications in MecUcinef FVentiee-itall, Inc., New York, 1953, pp. 201-206. 
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the entire population. It implies that, if a larger group were studied, we 
should find a few instances with basal diameters both smaller and larger 
than those found in the sample. 

Fitting the normal curve to data of physical ability. Table 
23-1 shows a distribution of the distances which 303 high school freshman 
girls were able to throw a baseball. These data are akin to those from 
which Chart 23,5 was drawn in that they are measurements for a number 

NUMBER OF 
EGG -CAPSULES 



BASAL DJAMCTER iN MILLIMETERS 


Chart 23.5. Normal Curs** F’fled to Ba^al l>ia metiers of 99 Fgg- 
Capsiilcs of a Marine Snail, Sipho curtus. Data of !>a.sal drum‘tt‘ra 
from Ounnar Thoryori, Stiulicfi on the Eog-'Capsate;> ami Ih'ulap'ru 'it of Arctic 
Marine Prosohranchs, p. 7, Moildi lelscr oni Grv^i)haKl-iuJLM(>ii(* al-Kommis- 
sionen for VidenskaboliKC lliukTsp^i'iser i Grpiiland 


of different individuals. It may be observed that very few of the girls 
threw the baseball less than 45 ieet and ver . Uwv threw it 113 feet or 
farther. The column diagriun of ( harl 23.6 she ws, the data of Table 23. 1 . 

To fit a normal curve, to an observed frequency distribution, we rewrite 
the equation 




__N[_ 
2.5066s 


2,71828 




where N is the number' of observations in the sample, 

i is the class interval of the sample distribution, and 
s is the standard deviation of the sample. 

rYx^~ 

We could use a = -» an estimate of cr, which is discussed in the 

following chapter, instead of s when fitting a normal curve to a set of 
observed data. However, we ordinarily prefer 5 , since it measures the 
dispersion of a sample of the observed size, rather than being an estimate 
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of the dispersion in the population. Furthermore, for a frequency dis- 
tribution having a Inrge enough N to warrant the fit of a normal curve, 
the dilTeronco between 5 and a i.s so slight as to have little or no effect 
on th(3 fit. For the data of Table 23.1, for (*xample, s = 20.95 feet and 
& -- 20.98 feet. 

The complete fitting process consists of two steps; first, the determina- 
tion of the values of a nmnlxa* of ordinates in order lo as(^ortain tlu' exact 


rVBIJC 23.1 


lUisvhall Tiirotvs for 

Dhtaru * 

by 3 OJ Fi r<i t - Yea r 11 h 

School Girla 



Number 

Distance in feot 1 

of 



15 but under 25 

r 

25 hut under 35 

et 

35 bm uiulor 45 

1 

45 but under 55 | 

j 25 

55 but under 05 

! 33 

05 but under 75 

' 53 

75 but under 85 j 

^ 04 

8v5 but under 95 j 

44 

95 but under 105 j 

31 

105 but under 115 | 

27 

1 15 but under 125 I 

]1 

125 bufc under 135 

4 

135 but uiulcr 145 

1 1 

Total . . 

r’“303' 

Data from Leonora W. 

Stew ai t aud 


Helen Weet, The Ffoel>ei Srhool, Gary, 
Indiana. Meafluremeiita wcic made m liriT). 


outline of the fitted curve, and, second, the computation of the propor- 
tionate areas for the portions of the curve tliat are important to us. 
Ordinates. Referring again to the formula for the normal curve, 




_Ni 

i^oods 


2.71828 


it appears that we need the values of N, T, and a in order to fit a normal 
curve to a distribution. Computing by procedures described in preced- 
ing chapters, we find X — 80.63 feet and s = 20.95 feet. As there were 
303 girls, N - 303. 

We shall first compute the ordinate to be erected at the mean. This is 
designated as To and is the maximum ordinate of the fitted curve. Since 
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X = 0 at the mean, we hav'C 


303 X 10 
2..3^0fj0 X 2a95 


2 . 718282 ‘- 0 '’"''\ 


In the expression above, the exponent of 2.7 1828 is zero. Since a number 


raised to the zero power is one, 2.71828^*''^'"''' - 1. ft is a])parcnt, then, 

11^' 

that the expression e is always ecjual to 1 for the ordinate elected at 
the mean and 


Therefore, 


2.50665 


2.50605 


c - To 2.71828 . 


For the piublcm in hand, 


303 X 10 _ 
2.5066 X 20.95 


We now wish to erect onouj;h addilional ordinates on either side of Yg to 
(Mialde us to sketch a reasonatih' srnooth curve. It wo seleid Micrajssive 
distancf's of 4.19 feet, from tin* mt^an. \vc shall erect ordinates at steps uf 
}.s‘ from the mean. The first pair of ordinates (since llie euj've svm- 
rnetrical) arc to he ereeted at j: ™ ±4.19 feet from the mean (A' ■= 81.82 
and 70.44 feet), u.'^in^ the e\pre.ssion 

\ » 10 ) ’ 

Y, - 57.7 X 2.7lK2S-'^‘''’'^'\ 

In ord(»r to deleimine the value V., it is not neccssaiy to com])ut(3 

2.71828“^‘^'^^''^^* but merely to refer t o Appendix D. l.ookin:.; up the appro- 
V ‘■I . 1 9 

priate value of which in this ease is - - - == 0.20, we lind that 

^ 5 20.95 

--ei.uu’ 

2.7i8282^*^^-‘^^^‘’ - 0.98020 


)\ - 57.7 X 0.98020 - 56.6. 

For the next pair of ordinates, .r -- i:8.38 feet {X 89.01 feet and 72.25 
feet) and 

Yc - 57.7 X 2.71828"'^'^‘^^‘*^^\ 
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TABLE 23.2 

Determination of Ordinates of iS'orinal Curve Fitted to Data of 
Baseball Throws for DisUince by First-Year Ifigh School Girls. 


^ 80 03 feot; s = 20.95 foci, V'd ■= 57.7) 


1 

^ 1 
(in feel, where j 

ordinates are 
to be creolod) 

(1) 

, 1 

(in fert, 1 

deviation : 

of A' from A) j 

(2) 1 

X 

s 

(3) 

rroporlionate 
height of 
ordinate 

rz.* 

2.7182S-''’ j 
(.\ppcii<li\ 1)) 

(4) ! 

lleiftht of 
ordinate 
[Col. 4 X >'(,] 

(5) 

13 

59 

-67 

04 

3 

20 

0. 

00598 

0 3 

17. 

78 

-62 

85 

3' 

00 

0 

OH 11 

0 6 

21 

97 ' 

-58 

66 

2 

80 

1 0 

01984 

1 1 

26. 

16 

-54 

17 

2 

60 

! 0. 

03 405 

2 0 

30 

35 

-50 

28 

2 

40 

1 0 

I 4 

3 2 

31 

51 

-46 

09 j 

2 

20 

i 0 

08S92 

5 1 

38 

73 

1 -41 

90 ! 

2 

00 

0 

13531 

7 8 

42 

92 

-37 

71 i 

1 

80 i 

0 

1!»7'.I0 

11 4 

47. 

U 

-33 

52 ! 

; 1 

60 

0 

27801 

16 0 

51 

30 ' 

-29. 

.33 ! 

1 1 

40 

! 0 

3753 1 

21 7 

55 

49 

-25 

14 ! 

i < 

20 

i 0 

j 

28 1 

50 

68 

f -20 

05 1 

1 1 

00 

I 0 


35 0 

03 

87 

! -ic. 

76 i 

i 0 

80 

i 0 

72615 ! 

41 9 

08 

(Hi 

! -'2 

57 

0 

60 

: 0 

h:).'>27 

48 2 

72 

25 

- 8 

38 

: 0 

40 

' 0 

92312 

53 3 

70 

14 

- 4 

19 

; 0 

20 

! 0 

!«02() 

56 6 

80 

63 

0 


‘ 0 


i 1 

00000 

57 7 

81 

82 

! + 4 

. 19 

i « 

.20 

I 0 

98020 

56 6 

SO 

01 

i -f 8 

38 ; 

1 0 

40 

! 0. 

02312 

53 3 

93 

20 

39 

! 4-12 

57 

! 0 

60 

i 0 

83527 

48.2 

97 

1 -f 16 

76 

0 

80 

! 0 

72615 

41,9 

101 

o8 

1 ' 4-20 

95 

1 

Ot) 

; 0 

60653 

j 35.0 

105 

77 

1 "f 25 

1 1 

1 

20 

1 0 

48675 

28 1 

JOM 

96 

I 4' 29 

33 

1 

40 

i 

3753! 

21 7 

114 

15 

1 4 33 

52 

1 

m 

! 0 

27801 

16 0 

118 

34 

! +37 

71 

; 1 

80 

i 0 

19790 

11 4 

122 

53 

i +41 

90 

i 2 

00 

; 0 

13534 

7 8 

126 

72 

+ 46 

0‘) 

‘ 2 

20 

• 0 

08,892 

5.1 

130 

91 

1 +50 

28 

’ 2 

10 

1 0 

05614 

3 2 

135 

10 

i -4-51 

47 

i ^ 

6tl 

i 0 

f)3405 

2 0 

139 

29 

i 1-58 

(>6 

: 2 

80 

i 

01981 

1 1 

113 

, 18 

: 4 62 

85 

i 3 

00 

! 0 

01 III 

0 6 

U7 

67 

1 +67 

01 

1 3 

20 

: 0 

00598 

0 3 


Here the ratio of ^ is 0.40 and, rcferritig to .\pi)endix D, we have 

s 


y, - 57.7 X 0 92312 =• 53.3. 

The process of deterniiniiig tho heights of the ordinates can be handled 
most expeditiously by use of a table similar to d'ablo 23 2. 'The ordinates 
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in the tipper aiul lower parts of the table are identical, since the fitted 
curve is symmetrical. 

The fitted curve is shoAvn in Chart 23.6. It follows tlie general shape of 
the sample, but smooths out the irregularities and indicates what might 
be expected if the performance of a very large number of comparable girls 
could be recorded. What we have done so far gives merely the shape of 
the fitted curve and a visual impression of the suitability of the fit, which 
appears good in this instance. 

.4rca.s. We have iiot yec undertaKcn to say what proportion of high 
school freshman girls may be expected to throw a baseball: (1) any 


NUMBER 
CF GIRLS 



tJiai't 2.n6. rst»i rmi! C urve* Filter! to Data of Uusrball J'hrows for 
Distunc'C 1>> A'lj-ar Scliool froni Tabli'a 23.1 

fiTul 23.2. 


specified iiiimber of U i't or more, (2) any spoc'iiied number of feet or less, 
or (3) a distance ecjiin! to m* greater than one spcM’itied value but equal to 
or less than another hu'ger valiHe Neither have we attempted to say 
Avhat proportion of girls may be expected to fall into eaeli of the various 
classes of the frequency (list rihution. Exju'cted frequencies are ascer^ 
tained by integrating th(‘ tilted curve. However, the proct^lure is 
greatly sim[)lified, and no knowk'dge of integration is needed, if we make 
use of a table of the arca>, under the normal curve sucli as Appendix E. 
This appendix gives the propoi i.onate area \inder the curve which is 

X 

between an ordinate at X and an ordinate at specified - distances in 

either direction (not both directions) from A. This statement is illus- 
trated by the small (‘hart shown with Appendix E. 'Phe largest proper- 
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tionate area shown in Appendix E is 0.50, since the area under the entire 
curve is 1,0. 

To ascertain the proportion of girls that may be expected to throw a 
baseball 100 feet or more, we first determine the proportion that may be 
expected between the values of = 80.63 feet and X = 100 feet and 
then subtract this proportion from 0.50. At X = 100 feet, x = 100 — 
80.63 == 19.37 feet, and, since s = 20.95, 


X ^ 19^7 
s ““ 20.95 


0.92. 


Referring to Appendix E, it appears that 0.3212 of the area is between the 
two values, and therefore 0.50 — 0.3212 = 0.1788, or about 18 per cent, 
of the area is at or beyond X = 100 feet. 

If we wish to know what proportion of girls may be expected to throw 
a baseball 50 feet or less, the procedure parallels that just given. The 
reader should worlc this out for himself. The answer is 7.2 per cent. 

We can avoid the subtractions involved in the two preceding paragraphs 
if w’c refer to Appendix G, which shows areas in one tail of the normal 
curve. This appendix and Appendix H, which gives areas in two tails 
of the normal curve, will be particularly useful in connection with part 
of the subject matter of Chapter 24. 

To determine the proportion of girls who may be expected to throw a 
baseball between 87 and 100 feet, we compute the area under the curve 
from X = 80.63 feet to X- = 87 feet, and the area from X == 80.63 feet to 
100 feet, and then take the difference between these two figures. The 
first proportionate area is obtained by u.sing 


X — 6.37 feet and 


z _ 6.37 
s ” 20.95 


0.30. 


Appendix E shows that 0.1 179 of the area is between X - 80.63 feet and 
X = 87 feet. We already know that 0.3212 of the area is between 
X “ 80.63 feet and X = 100 feet, so the proportionate area between 
87 feet and 100 feet is 


0.3212 — 0.1 179 = 0.2033, or about 20 per cent. 


Referring to Table 23.3, the expected frequencies in each class of the 
frequency distribution are obtained as follows; 


1. In Column (1) of the table, enter the classes of the original distri- 
bution, allowing for one or two additional classes at each end, since the 
fitted curve should usually have a greater range than the sample. Theo- 
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frequencies. This is of iinportanoe in making the x* test of Tubie 25 10 . 
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retically the fitted curve is of unlimited range in both directions. Allow 
two spaces for the class in which the mean falls. 

2. Ill t'olunin (2), write the lower limits of each class below the moan 
in value and the lower limit of the (‘lass which contains the mean. 

3. Ill Column (3), write the upper limit of each class above the mean 
in value and the upper limit of the class which include^ the mean. 



75 I 85 95 106 

X 

60-63 


DISTANCE IN FEET 

Chart 23.7. Craphir Keproftcntation of the Piomlure in Coiuinrifl (6} ami 

(7 ) of Tahlc 23.3. 

4. We shall as(‘ertain first tin* proportionate area between the mean 
(80.63 feet) and the upper limit (85 feet) of the class in which tin' mean 
falls. The deviation of the upper limit from the moan is 1.37 feet; this 
value is entered in (a'^lumn (I). Since s ^ 20A)5 feet, 


,s 



This value is entered in Column (5). Now, looking np 0 21 in Appendix 
E. we find that 0.0832 of the area is between the mean and 8.5 feet. This 
value i.s entered in Column (6). The procedure is shown graf)hica]]y in 
Chart 23.7. 

5. The next step consists of determining t he proportionate area b(^tw(‘on 
the mean and the upper limit of the first class above the mean, '^rhis 
limit is 95 feet; x — 14.37 feet and 


x _ 14.37 
s *” 20;95 


0.69. 
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Looking up 0.69 in Appendix E shows that 0.2549 of the area would he 
expected to be between the moan and 95 feet. This value i.s entered in 
Column (6). If 0.2540 of the area is found between 80.03 and 95 feet, 
while 0.0832 of the area oeenrs between 80.03 and 85 f(^et, there would 
be 0.2549 -- 0.0832 ~ 0.1717 of the area i)etween 85 and 95 feet. The 
result of this .subtraction is entered in Column (7) ; this j)roceduro is also 
indicated graphically in C-hart 23 7. 

6. The procedure^ in Sto]) 5 is repeated for each elas.s above the mean in 
value. The proportionate aieas from the mean to the upper limit of 
eaeh ehas.s are as(‘ertained, and them the proportions from the moan to the 
upper limit of the precediug ehuss are .subtra(‘ted, a.s shown in the table, 

7. The proportionate areas between the mean and the lower limits 
shown in Ct)lumn (2) of the table are next d(Uermined. Since these 
areas are also cumulative, successive subtraction i.s again necessary. 

8. We now have entered in C^)lumri {7} the proportionate areas for 
each class except the class (containing the mean. We have determined, 
in Coiunu= (6), that 0.0832 of the area is between the mean and 85 feet, 
and that there is 0,1064 of the area belwe(‘n the mean and 75 feet. Add- 
ing the.se two figures gives 0.1806, the proportion of the area in this class 
[see Column (7) ami (‘hart 23. 7j. 

9. The total of Column (,7) shouhi be 1.0000, as there is 0.5000 ('»f the 
area from the mean to either extreme of tlie distribution. In order To see 
the agreement between the ob.serv(*d and the expe(‘ted frequencies, wo 
include Column (8j, whi(4i is obtained by multiplying 303 by the pro- 
portionate area of each clas.s. 

A comparison of the ex{)ected fr«Hjuencies, shown in Column (S) of 
d’aMe 23.3, with the observed fre((uenci(*s c'jf Tc.'4e 23.1 leveals a general 
jigreement of the ligur^^s, the difference being g vat ost for the class ‘bS5 
but under 95 feet.” A test of the ‘‘goodnes.s of fit” of the normal curve 
will bo described in (dmpter 25. 

The normal curve ami collar .sizes. To illustrate another use of the 
normal curw, let u.« assume that a maker of collars is considering the 
production of a collar styled espe(‘Lally for colk'ge nuui. Consideration 
will, of course, be given to the number of (uillars of each size which should 
be made. Since college men represent a selected group, it would be 
desirable to adjust the niiiiiufactiiring schedule to their particular require- 
ments. Extensive data on the c^'nnmiference of the necks of college men 
are not available, but d'able 23.4 shows the neck measurements (^f 231 
male college students. To fit a normal (uirvo, we need X ~ 14.232 inches 
and s = 0.719 inches. The column diagram of the observed data and the 
fitted curve are shown in (^hart 23.8. 

Our problem, in this instance, is not to determine the expected proper- 



604 


FITTED FREQUENCY CURVES 


(Chap. 23 


NUMBER or 
STUDENTS 



Cliiirt 23.8. Normal Curve Fitted to Neck <-ireij riifcrciicc of 231 
]Malc College Students. Based on data of 'Fable 23.‘i. 

tioH of college men having necks “12.75 but under 13.25“ inches in cir- 
cumference, “13.25 but under* 13.75“ inches in circiiintVrencc, and so 
forth, but rather to determine the number of (‘olhirs^of each si/.e (by half 
sizes) which sliould be made, bixporitmee show.s that, on (lie. average, 
collars are worn about f of an inch larger than tin? cinaunfereiuaj of the 
neck. This means that collars size M would be worn liy men wlio.se necks 
averaged 13.25 inches, and, since wo are dealing with half sizes, the nocks 
would range from 13 to' 13.5 inches in circumference. The first column 
of Table 23.5 lists the collar sizes, while the second colunin shows the 
corresponding neck circumferences. It is for these classes that we neqd 


TABLE 23.4 

yeck Girvumferrnce o/ 
231 Male Colic fj{e 
Students 


Mid-values i 

1 Number o 

(in inches) | 

students 

“ 12..5~ j 

4 ■' 

J3.0 1 

11) 

13 5 i 

1 30 

14 0 1 

G3 

14.5 1 

66 

15 0 

29 

15.5 

18 

16.0 1 

1 1 

IG 5 

1 

Total 

1 ' ' 231 


Source of data confidential. 
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Determination of Expected Distribution of Collar Sizes for Male College Students 
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to ascertain the theoretical frequencies. This is done in the remainder 
of the columns^ and the expected frequencies {N — 1,000) are shown in 
Column (9). If our basic data are representative, there would be about 
270 customers in a thousand calling for size 15 collars, 221 asking for size 
My, 213 reciuesting size 15y, and so on. It is interesting to observe that 
we might expect only 8 out of a thousand of this group to ask for size 13 
or smaller and but 7 out of a thousand to recpiire 17 or larger. 

Suitability of the normal curve. As previously pointed out, the 
normal curve is only one of a number of kinds of curves which may be 

TABLK 23.6 


Cumtilafive Disiribution oj Ihisphall 
Thro tvs for Distance by 303 First- 
War High School (^irls 




1 feet 

i Nninher 1 

Vvr c 

‘eat 

Distance iii 

1 of girls i 

of total 

1 

than 

25 ~ 

I l" I 

0 

.33 

Less 

than 

35 

I 3 i 

0 

99 

Los.s 

than 

45 i 

1 10 : 

3, 

,30 

Less 

than 

55 i 

1 35 1 

11 

55 

Less 

than 

05 

' ()8 I 

22 

41 

Less 

t fian 

75 

J2I ; 

39 

93 

IjCSS 

than 

85 

1 S5 

01 

00 

Lc''«''j 

Than 

05 

220 1 

75 

58 

Iwes.s 

than 

105 

200 

85 

81 

Less 

than 

1 15 

287 

94 

72 


than 

125 

208 

98 

35 


tlian 

135 

302 

99 

07 

Les'< 

t hari 

1 15 . 

303 

UX) 

00 

C 'jmui 

' a n \ t 

|}«>a of 

I i»,;p 2.1 1 




fitted to a fr«*rjuen<*y distrilnition. It should in no s(miso be thought of as 
a form having geruTal appli(‘ai>ility to all distributions. Siin^e this is 
true, w’hat guides arf^ tliere which will tell us when to til a normal curve, 
or, when fitted, if it is suitable? 

1. The plotted curve or column diagram of the sample distribution 
serves as a very cnido guide. If there is marked skewru'ss present, it will 
be apparent, as will also any irregularities. 

2. The sample data may be cumulated amj put into percentage form, 
as in Table 23.fi; these (‘uinulative percentages may then be plotted on 
arithmetic probability paper, ^ as in (3uirt 23.9. If the re.sulting curve is 
approximately a straight line, we may proceed with assurance to fit a 
normal curve. 

^ 'Fhe vertical scale is so flosignod that thf oKivo of a normal ciirvi' will appear as a 
straight line. 
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3. The values of /3i and /S 2 may be computed as described in Chapter 
10, and, by methods which are set forth in (Uiapter 26, we may ascertain 
whether 0i differs significantly from zero and whether ^2 differs signifi- 
cantly from 3.0. P'or the throws of a iiasehaJl by high school freshman 
girls, (3\ ~ 0.0101 and /Sj ^ 2.7724. Xeither of Uiese values differs 
significantly from the value for a 

normal curve. 

4. After the curve has been fitted 
and the expected frofpieneies have 
been determined for the \'arious 
cla.sses, a test of “goodness of fit” 
may be made. This test is diiscrilied 
in Chapter 25, and indicates tliat 
the fit of the normal curve <0 the 
data of baseball throws by girls is 
satisfactory. 
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BINOMIALS 

It was previously shown tluit the 
expansion of a syiiiTnetri<*al binomial 
(ff + i)"'' be ajiproxiinated ex- 
perimentally by to‘-'<.^ing coins. An 
a6ymmclrical binomial rnav be ex- 
panded experiinenlally in a similar 
fashion. 

Experiinenlal coiistriictioii of 
skewcNl binomials, bet us con- 
sider, first, a single die, four sides of 
which are colored black. If we toss 
this die, it is ap])arent that the prob- 
ability (tt) of liaving a white side 
come up is 1 out of 3, or while the 

probability (r = I ~ tt) of obtaining a Idack side is 2 out of 3, or 
Using A (w'hich has no numerical value) to indicate the occurrence of a 
white side and B (which also has no numerical valiuO to indicate the non- 
occurrence of a wdiite side, that i.s, the occurrence of a black side, W'C may 
express the situation as 

tB + ttA or 1/? + lA, 

wliieh indicates that, if the die (assumed to be perfectly balanced) is 
tossed 1,500 times, we should expe(‘t a black side to appear 1,000 times 
and a white side 500 times. 


Chari 2^.9. Cumulative Distri- 
billion of Basehall Throi^s for 
lam-e b> First-Year High School 
(rirls, Shown on \rillimctic Proha- 
hilitj Paper. Based on data of 
Tal)lc 2 a.t). 
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If, now, we toss* two dice (each having four black sides), there may 
appear either no white faces (2 black faces), one white face (a white face 
and a black face), or two white faces. The expression is 

(|B + uy = + ^BA + 

Therefore, if 1,800 throws are made, we should expect to obtain no white 
faces 800 times, one white face 800 times, and two white fat^es 200 times. 



too 9B 88 7B 6B 5B 4B 3B ZB ID OB 


Churl 23.10. Expected Kesults of .'>9.049 Throws of 10 Dice, Each Hav- 
ing Four Black Sides and Two White Sides. Thu uxpuctcfi orcurrciicea ai*P 

given by (Jb + 

. 5,120 , 11.520 . ^ 5,360 1 *«c J 

” 60.048^ 59.049^^^ A 59,049^ ^ ’b 59,049^ 1 AO, 049*'^ 'f 59,049^ " 


^ 59.049^ ^ ^ 59.049' 


180 




20 

AO'tno 


4 VO _K . 10 


If three such dice are thrown, the expression is 

(IB + iAy - + ||B^4 + AB.42 + ^A\ 

It will be observed that the binomial is beginning to show its skewed 
nature. This will be more clearly seen if we consider throwing ten dice, 
each with four black sides. The expression is (|B + |A)'®, which is 
shown graphically in Chart 23.10, The curve is definitely skewed as a 
result of the fact that r and tt are unequal. 
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If r is a larger fraction and x is smaller, the skewness will be even 
greater. Let us consider as an illustration a four- 
sided pyramidal die with one white side and three 
black sides. It will be necessary to consider the 
‘‘down” side as the one obtained at a throw. For 
throwing one die, the expression is iB + JA. 

If 10 of these four-sided dice are thrown, their 
behavior is indicated by {iB + iA) The expansion 
of this binomial is shown in Chart 23.11, which is 
noticeably more skewed than the curve of Chart 23.10. 

Fitting a binomial. It is apparent from the expression for a 
binomial that it is a device most useful for fitting to discrete data. In 
order to fit a binomial to a series of observed data, the following three 



A Four-Sided 
Die, Each Side 
of Which an 
Equilateral Tri- 
an||;le. 


OCCURRENCES \H THOUSANDS 



Chart 23.11. Expected Kesults of 1,048,576 throws of 10 Four-Sided 
Dice, Each Having Three Black Sides and One White Side. The expected 
u , I 5«.049 196.830 

occurrences are given by + ~^A = 1,^87576^* 1:0487576^^ + 

262.440 61,236^ , . 17.010 , . ^240 

+ f.OiWe^ ^ + 17048:676-'^ ^ 1.0 ' -376 

405 30 . 1 


1 , 048 , 670 ^^ ^ 


"h 1,048, .576'^*^* 1,048,676"^*^ "b 1,04V576 






1,048.576 




steps are necessary: (1) Determine the proper value of x, which also gives 
us r, since r == 1 — x. The size of x determines the degree of skewness 
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of the curve. If ~ 0.50, then r ~ 0.50 and the curve is symmetrical. 
The farther removed tt is from 0.50, in either direction, the greater the 
skewness. If tt < 0.50, the curve is positively skewed; if tt > 0.50, it is 
negatively skewed. When population values (tt and r) are not known, 
or when a reasonable assumption conc<‘rning them cannot be made, we 
have no altornativ^e but to employ proportions determined from the 
sample. These we call p and q, (2) Expand the binomial (r + tt)^ or 
iq + p) where N ~ the number of (‘ategories minus one, since there are 


TABLK 2X7 

Nutnber of Male Born 

In Utters of Five 

Niimhrr of 

Nuiohor I littois havsnjr 
of mules f'fuMMticd 

’ miinher of males 
0 ; 2 

1 i 20 

2 ; 41 

3 ! 35 

4 ' \\ 

5 : 4 

"Potal ■ lie 

Dtttn f'-oni A s ‘'StudiPH 

on t)u* S»*x-Hatn> and fU'latcd CIh* 
tkoriietva '1 li»* I f t'( 4 iienc:('s of So\ 
r»,?iiU)nat lona in LiUr*f»,' 

mct'xkc, Vol. 15 a'C»3;. pp :i73 381 
fits a liiiiuJnfHl tu tli'‘ 
rit s p i 0 4870 hk d«;f» r- 

Muru'd ft>r littfra of i to 1* pu>:s Hoi 
ril'd ff'iiuou o;.. at<' jdi'Iif J< h1 
with j)'i' H 


.V 4' 1 torm.s in the expanded binomi<d. X is also tln^ number ot items 
in a .samj)i(\ (3) Multiply each of the fractions of the* expanded binomial 
by k, the number of sami)les. 

Talile 23.7 shows a distribution of the number of male pigs occurring in 
litters 01 fue pigs. Thr data are f(n' 110 such litters; so iV = 5 and 
k IH). Altogfdher there art* 5 X 1 10 == 580 pigs of both sexes and 
(0 X 2) -f (I X 20) + (2 X H) + (3 X 35) + (4 X 14) + (5 X 4) - 
283 male pigs d'lu’ proportion of male pigs, p, is therefore 


283 

580 


0 4870 


and q — 0.5121, 

As pointed out above, the fitting is accomplished by expanding /f(g + p)^. 
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Substituting 5 for N, but retaining the other symbols, we have 

k{q + py » k(q^ + + lOgV + lOgV + 

where the exponent of p indicates the number of males born in a litter of 5. 

The numerical expression to use in fitting the binomial is (0.5121 + 
0.4879)*, and, since k « 116, we should expand 116(0.5121 + 0.4879)*. 

NUMBER 
OF LITTERS 



Chart 23.12, Binomial Fitted to l>i»tribution of Number of Male 
Pigs Born in Litters of Five. Data from Tables 23.7 and 23.8. 

This becomes 

116[(0.5121)* + 5(0.5121)^0.4879) + 10(0.5121)^(0.4879)* 

-f 10(0.5121)*(0.4879)* + 5(0.5121)(0.4879)M- (0,4879)*]. 

The computations are most conveniently carried out by means of loga- 
rithms, as shown in Table 23.8. Although the powers could be obtained 
and the multiplications could be performed for this problem by the use 
of a calculating machine, the use of logarithms is essential when a bino- 
mial is raised to an appreciably higher power. 

Chart 23.12 shows the observed and the expected frequencies. The 
observed data have been presented by means of separated bars to suggest 
the discrete nature of the series. A test of “goodness of fit,” similar to 
that described in Chapter -25, indicates good agreement between the 
observed and expected frequencies. 
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It should not be assumed that all discrete series may be fitted by the 
metliod just explained. Some data are better described by other distri- 
butions, as, for example, the Poisson, the fitting of which is described 
elsewhere by one of the writers.® 

SKEWED CURVES 

The binomials just discussed are suitable for fitting to discrete data, but 
are not accurate enough to use with continuous data. A fitted binomial 
consists of a series of ordinate.? erected at specific points on the X-axis 
(see Chart 23 . 12 ). If this procedure were applied to a distribution of con- 
tinuous data (or to discrete data where the X units are small in relation to 


NuineeR 

OF HOMES 



Chart 23.13. logarithmic Normal Curve Fitted to Kilowatt Hours 
of Electricity Used per Mouth in 282 Medium-Class Homes in an East- 
ern City. Based on data of Table 23.9, 

the class interval), we should be erecting an ordinate at the mid- value of 
each class, instead of determining the area under a smooth curve. Obvi- 
ously, the greater the number of classes, the less would be the difference 
between these two procedures. 

There are a great many types of skewed curves which may be fitted to 
frequency distributions. It is the purpose of this volume, not to enter 
into an extended consideration of this topic, but merely to sketch briefly 
the procedure involved in fitting two of the simpler types.^ 

The logarithmic normal curve. Some distributions which are 
skewed to the right become symmetrical when plotted in terms of the 

• See the reference given at the end of note 4. 

For a more detailed discussion, see: W, P. Elderton, F rtguency Curves and Correla- 
tion, Cambridge University Press, Cambridge, England, 1953 (4th edition); H. L. 
Riets, Mathematical Statistics, Open Court Publishing Co., Chicago, 1927; Arne 
lilsher, Maihematical Theory of Probabilities, The Macmillan Company, New York, 
1922 (2nd Edition). 
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° . TABLE 23.9 

Kilowatt flours of Electricity 
lined per Month in Med- 
ium-Clans Homes in 
an Eastern City 

Numbrr 


KTfowatt hoiir« 
Uiml-values) 

lb 

14 

18 

22 

20 

30 

34 

38 

42 

40 

50 

54 

58 

Total 


of homos 
26 
50 
53 
48 
30 
20 ) 
19 
8 
0 

3 

4 
2 
2 

282 


■ Dat. (r»B. F.loutncal T«iinE 
tone. Ne* York City Na.n.- o» 
wiUilielc by reQueut, 
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l"harl 23.15. Kilowalt Hours of Klcctrioitx I sod por Mc»nth in 
282 Medium-Clasv Hiunes in an Kasloru (’it>. Shown on logarithmu: 
probability p.ipor. liasi'd on dat.a of Tablt* 23.9. 


data have been re-plotted b\it againist a logarithmic .Y-soale. When the 
curve is ex1end(*d to the liorizoi il axis at A" == b kilowatt hours (the 
class just below the first one shown in the table), the approximate sym- 
metrical nature of the series in terms of logarithmic A" values is apparent. 
A further indii'ation of this is shown in Chart ‘J3.15, which presents the 
cumulative percentage frequencies plotted on logarithmic probability 
paper. 
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Fitting a logarithmic normal curve. The procedure for fitting a 
logarithmic normal curve has been given by Davies* and is essentially the 
same process as that of fitting a normal curve, except that we use the 
arithmetic mean and the standard deviation Sio* of the logarithms of 
the X values. The values of Xio, and Sio, may be computed by making 
use of the mid-values of the logarithms of the class limits. Ideally the 
classes should be so chosen that the class intervals are equal in a loga- 
rithmic sense, thus making the logarithmic mid-values equidistant from 
each other. Usually we deal with ready-formed frequency distributions 
of arithmetically equal class intervals, and with such distributions the 
direct computation of and Sio, is laborious. The inconvenience of 
computing these logarithmic values has been eliminated by Davies, who 
gives formulas based upon the quartiles, which are readily computed. 
Furthermore, according to Davies, there are certain advantages to the 
procedure. He says: “Unless the data are very regular, these and 
Slot) may be more satisfactorily computed from the quartiles, thus avoiding 
the disturbing effects of irregular extreme items.“ The expressions are 
given below. 

r -- ^ log O2 

3.2554 

This is the weighted average of the three quartiles, the weights being 
proportional to the heights of normal-curve ordinates erected at these 
values. 

Sio. = 0.7413(log Qz - log Qi). 

This expression grows out of the fact that in a normal curve 50 per cent 
of the items are included within ±Q of the median (or mean), and also 
that 50 per cent of the items are included within ±0.07456* of the mean. 
It is therefore obvious that 

Since 



it follows that 

Qi- Qx = 2Q, and « = 0.7413(Q, - Qi). 

For the data of electric consumption, Qi = 15.6400 kwh., Qa (the 
median) = 21.0833 kwh., and Qj * 27.9444 kwh. 

•G. K. Davies and W. F. Crowder, Methods of Rtatislical Analym, pp. 303-306; 
and G. R. Davies, “The Analysis of Frequency Distributions,” Journal of the dmertcan 
St<Uislical A»$ociation, Vol. 24, December 1929, pp. 349-366. 
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log 15.6400 + log 27.9444 + 1.2554 log 21.0833 
3.2554 

1.194237 + 1.446295 -f 1.2554(1.323939) 

372554 


4.302605 

3.2554” 


1.321682. 


= 0.7413(log 27.9444 - log 15.6400), 
= 0.7413(1.446295 - 1.194237), 

= 0.7413(0.252058), 

= 0.18G851. 
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Using these two values, the expected frequencies in each class may be 
determined in a manner strictly parallel to that previously described for 
the normal curve. As before, Appendix E is used and the procedure is 
set forth in Table 23.10. 

The ordinates are computed from the expression® 


0,4343iVi 

jr == 2.71828 , 

2.5066Xs,o. 


which may he simplified for purposes of computation to 


0.1 7326 A't T,” 

Yc = 2.71828 

X is the arithmetic value of the point on the AT-axis at which the ordinate 

is to be erected. The values of 2.71828 are obtained from Appendix 
■ It will be recalled that the expression for the normal curve is 




Ni 


2.5OG65 


— 2.7182S 2«« 


For fitting the logarithmic normal curve, the expression cannot be used in this form, 
since s is in terms of logarithms (sio*), while the class intervals i are equal arithmeti- 

. , , , , logio € 0.4343 

cally. We therefore multiply t b^ *he adjustment factor — ^ — or — — ? to compen- 
sate for the fact that the intervals are not geometrically equal. We thus have 


Yc 


0.4343 


Ni 




2.5066«ic 


2.71828 “’to* ' 



TABLE 23.10 

Determination of Expected Frequencies for Logarithmic Sornial Curve Fitted to Data of Kiloxcatt Hours of Electricity 

I sed per Month in Medium -Class Homes in an Eastern City - 
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D and the — values are given by 

5 to* 

SloB Slogi 

The procedure for doteriYiiniug the ordinates parallels that for the normal 
curve ^vbich was sho\vn in Table 23,2. The fitted curve is shown in 
Chart 23.13 and the correspondence between that curve and the column 
diagram is apparent. 

Davies suggests a logarithmic roefficif'nt of skewness 


log Qi + log [h- 2 log Q 2 

' log’Qr- log Qi 

and points out that a series which yields a coefficient of less than 0.15 
(or perhaps even 0.20) may tentatively be considered as logarithmically 
normal. If, howev(*r, a skewed distribution is not inherently logarithmic, 
Da V let notes that it may sometimes be adjusted by shifting the .Y values 
until the desired skewness is obtained; after fitting, the -Y values are again 
ftlufted. This correction c is obtained by 

Qi + Q3 ^ 


This value is added to the class limits and to the quartiles, after w^hich 
X]ot and Sug are computed. The fitting proceeds as in Table 23.10, but 
the shifted class limits are used. After the expected frequencie^s have 
been ascertained, the class limits are ^shifted })ack to their original values, 
it is obvious that this device extends the usefulness of the logarithmic 
normal curve. 

Fitting a normal curve with adjustment for skewness. The 

formulas previously given for the normal curve enabled us to fit a sym- 
metrical curve from a kiiowdedge of X, and .V. We have just con- 
sidered one method of fitting a ske\Ved curve. Another procedure that 
is useful for certain skewed distributions consists of using also a measure 
of skewness == V/ii and thereby making a correction to the fit of a 
normal curve. Thi.s is sometimes referred to as a second approximation 
curve. The equatioiri'^ is 


Yc 


_^Ni 

2 . 50066 ' 


2.71828'-’*^ 


m 

2.r)0()Gs 


2.71828 



The expression ineindes the first two terms of tlie Gram-Charlior series. For a 
further discussion, see W. Shewhart. Econornic Control of Quality of Manufactured 
Product, pp. 84 94, D. Van No.strand Company, Inc., New York, 1931. 



620 


FITTE 


FREQUENCY CURVES 


(Crap. 23 


TABLE 23.11 


Computation of Y, a, and at for Depth of Sapteood 


Depth in 
inches 

vmid-v allies) 

/ 

d* 

fd' 

AdV 

f(dV 

l.O 

2 

-7 

- 14 

98 

- 686 

1.3 

29 

-6 

-174 

1,044 

- 6.264 

1.6 

62 

-5 

-310 

1,550 

- 7,750 

1 9 

106 

-4 

-424 

1,606 

- 6,784 

2.2 

15.3 

-3 

-459 

1,377 

- 4,131 

2 5 

186 

-2 

-372 

744 

- 1,488 

2.8 

193 

-1 

-103 

193 

193 

3.1 

188 

0 

0 ! 

0 

0 

3 4 

151 

1 

151 1 

151 

151 

3 7 

123 

2 

246 

492 

984 

4.0 1 

82 

3 

246 

738 

2,214 

4.3 

48 

4 

192 

768 

3,072 

4.6 

27 

5 

135 

675 

3,a375 

4.9 

14 

6 

84 

504 

3,024 

5.2 

5 

7 

35 

245 

1,715 

5.5 

1 

8 

8 

64 

512 

Total i 

1,,370 


"~'-849" 

167339 

' - 127249 


Dftta (rom W. A. Shewhart, Economic Control of Quality of Manufactured Product, 
p. 77, D, Van Nostrand Co., New York, 1931. Courtesy of D. Vbo Nostrand Co., Inc. 


2/d' 

Vi » 0.619708. 


Vt 




1 


2/(d')» 
N " 


7.546715. 

-8.940876. 




3.1 


~ 1(0.6 19708) (0.3) I, 


=- 2.9141 inches. 


Since Sheppard's correction is not applied, we have 
« p, ~ i/J - 7.162677. 

T* “Pi — 3j^iV| 4' * 4,613422. 

8 ^ i \/irt ^ 0.8029 inches. 

a. - VFi - J-,’ or = +0.2407. 

The expression preceding the minus sign is that tor the normal curve, 
while the expression in braces represents the modification for skewness. 
In order to determine the expected frequencies, the above equation must 
be integrated. This is accomplished by the use of tables. To use these 
tables, we write 
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where Fi represents the areas of the normal curve (given in Appendix 
^E) and represents the modification for skewness. Values of 

Ft are obtained from Appendix F and are then multiplied by az. 

As an illustration of this method of fitting, we use the data of Table 
23.11, which are shown graphically in Chart 23.16. The fitting pro- 


NUMBER 



Chart 2.3.16. Second Approximation Cur^e Fitted to Depth of Sap- 
wood, Housed on data of Table 23.11. 


cedurc'* for a second approximation cur^^'e is shown in Table 23.12. The 
values of iV, .Y, s, and having been obtained (Table 23.11), the steps 
are as follows: 


1. Make entries in Columns (1) to (6) inclusive, as was done in fitting 
a normal curve. 


2 . 


Refer to Appendix F and enter in Column (7) the F 2 



values 


"Sheppard’s correction has been applied in the computation of the second 
moment, partly because high contact is not present at the left in Chart 23. 16. Fur- 
thermore, Shewhart points out (op. cit., p. 78) that the corrected standard deviation 
(0.798211) differs more from the standard deviation of the ungrouped data (0.802555) 
than does the uncorrected standard deviation (0.802895) When high contact is not 
present at both ends of a distribution, overcorrection of a moment is not unusual. It 
arises because the corrections allow for non-existent classes at the extremes. 
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FITTED FREQUENCY (juRVES 

associated with each - value of Column (5). Negative signs are entered 
6 - 

in this column for the percentages associated with class limits of Column 

( 2 ). 

3. In Column (8), multiply each \alue oi Column (1) by a-s. Signs 
are shown. 

4. To produce Column (0), the values in Column (8J arc subtracted 
algebraically from the values in Column (G). 

5. The cumulative proportionate frc(iuencies of (Column (9) are 
dccumulated in ('olurnn (10), as was dome for the normal curve. The 
rcvsult is a scries of figures showing expected freciucncies on the basis of the 
second approximation for N ~ 1.0000, One of the shortiiomings of this 
curve is that it may occasionally prodin^e negative frequencies at one end, 
or, if we do not extend the fit far enough to produce these negative fre- 
quencies, the total may slightly exceed i.OOOO. In this instance Column 
(10) totals 1.0002. 

t). ill Column (11) the expected frequencies are prorated among the 
classes so that the total equals N for the sample. 
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lower-case Greek beta; skewiiess in a population, 
skewness of the distribution of sample J? values, 
kurtosis in a population. 

02 j^: kurtosis of the distribution of sample X values. 

D: a difference between paired values. 

d': deviation, in terms of class intervals, of A' from X^. 

Q.i 

F: see Chapter 26. 

/: frequency. 

k: number of samples, k will ordinarily be much smaller than K, 

K: the number of possible samples of a given size from a population. 
n: degrees of freedom in a sample. When two samples are under con- 
sideration, n = ni 4” ^ 2 . 

N: the number of items in a sample. 

P: probability; varies from 0 to 1. 

(P: the number of items in a population. As a subscript, (P means popu- 
lation,'' thus X( 5 > is the arithmetic mean of a population, 
r: the correlation coefficient. 
s: the standard deviation of a sample. 

tr: lower-case Greek sigma; the standard deviation of a population. 

&: the estimated standard deviation of a population, computed from a 
single sample. Referred to as '*sigma caret" or '^sigma hat." 

is an estimate based on sample 1. 

&2 is an estimate based on .sample 2. 

3“ 1+2 is an estimate computed by pooling values and degrees of 
freedom from two samplers. 

S’pi the estimated population standard error for a series of D values. 
as: the standard error of X. When two samples are under consideration, 
we use as I and (Tjf,. 

fl'jf : the estimated standard error of X. 

the estimated standard error of the difference betv/een two sample 
arithmetic means. 

^Xj>‘ the estimated standard error of X/>. 

2: upper-case Greek sigma, meaning ''take the sum of." 
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— X(p Xi — X2 Xj} 

, , ^ — 


x: X — X \ also, X — Xa^ in the expression which see. 

(T • 

Zi: a deviation of a value in scries 1 from Xi; ^x\ == ^(Xi — Xi)‘^. 
X 2 : a deviation of a value in scries 2 fruni X 2 ] '^xl == 2(A".? -- X/)". 
X: an observed value in a sample. 

Xi: an observed value in sample 1. 

X 2 : an observed value in sample 2. 

X : the arithmetic mean of a sample. 

Xi: the arithmetic mean of sample 1. 

X 2 : the arithmetic mean of sample 2. 

Xrji the arithmetic mean of a series of I) values 
X(v»; the arithmetic mean of a populatiriii. 

Xcpi: the lower confidence limit of X^p. 

X(P»; the upper confidence limit of X(y. 

X V - X > 

a deviation divided by its standard error, for example, - — 

<T ' 

X^: lower-case CJreek chi. See Chapter 25. 



CHAPTER 24 


Statistical Significance I; Arithmetic 

Means 


In this following chapters, we shall be interested in the 

behavio" of statistical measures computed from samples. This is an 
topic, since the statistical worker will nearly always be dealing 
with which constitute a sample rather than a population. Usually, 
it Vs not possible to consider all of the items in a population. For exam- 
lyle, it w'ould be utterly impracticable to attempt to obtain data of the 
heights of all tlie adult males in the United States. If data of this sort 
were needed, a much smaller expenditure of time and money would be 
involv'ed if a suitable samp)le were to be studied. Furthermore, the study 
of a properly representative sample can be expected to give satisfactory 
results, ^the reliability of which may be stated exactly. 

In this book we shall consider only random samf)les.^ Arithmetic 
mean.s will be discussed in the present chapter. C’hapter will deal with 
proportions and with certain aspects of the (chi-.s(|uare) lest. Chapter 
26 will discuss variances, the analysis of variance, correlation coefficients, 
and measures of skewness and kurtosis. 

HOW SAMPLE ARITHMETIC MEANS ARE DISTRIBUTED 

Data of the mileage run by each of many thousands of automobile tires 
of the same size, ciuality, and make, used on similar vehi(^les under com- 
parable road conditions, show' an arithmetic mean {Xc?) of 15,200 miles 
and a standard deviation (<r) of 1,248 miles. If wc select a random sample 
of 25 tires, we would expect the arithmetic mean of the random sample 
to be in the general neighborhood of 15,200 miles. A second random 

* A random sample was defined on page 20. The procedures for certain types of 
non-random samples are given in JI. M. Walker and J. I.»ev, Siafistical Inference f 
Henry Holt and Co., New York, 1053, pp. 171-178; additional references are given 
on pages 177 and 178. 
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sample of 25 items would not 3 rield exactly the same arithmetic mean as 
the first, but it, too, should be in the general neighborhood of 15,200. 
Our first concern is with the behavior of arithmetic means of random 
samples. Since we shall be dealing with onlyTandom samples, and since 
we shall not be considering geometric, harmonic, or other means, we shall 
simply say sample mean to refer to the arithmetic mean of a random 
sample. 

The arithmetic mean of sample means. If a number of random 
samples, each of 25 tires, were to be taken from the tire population just 
mentioned, some of the sample means would exceed 15,200 miles and some 
would fall below 15,200 miles. One, or a very few, might happen to be 
exactly 15,2(X) miles. The arithmetic mean of sample means would tend 
to equal 

Consider a more specifK’ illustration: Walter A. Shewhart^ constructed 
a population of 998 items, having positive and negative values ranging 
from —3.0 to 3.0, and with = 0. It is not important at this point 
that the population was as nearly normal as it was possible to make it. 
From this population Shewhart drew 1,000 samples {k = 1,000) of 4 items 
(AT = 4 ) each. The arithmetic mean of the 1,000 sample means was 
0.014. If a larger number of sample means had been taken, it is reason- 
able to believe that the arithmetic mean of the sample means would have 
been more nearly zero, since it may be showui that, if all possible samples 
{K) of size N are drawn from a population, the arithmetic mean of the 
sample means will equal the population mean.^ That is, 

-f- X 2 + 4“ ‘ * * ^ 


Skewness of sample means. If sample means are from a population 
which has no skewness, the distribution of sample means will not be 
skewed. If the population is skewed, the distribution of sample means 
will show less skewness, the skewness being inversely related to the size 
of the sample, according to tlie relationship 


Shewhart^s population of 9\. items had 0. The distribution 

of the 1,000 sample means, together with the population, is shown in 


* Walter A. Shewhart, Economic Control of Quality of Manufactured Product^ D. Van 
Nostraiid Co., Inc., New York, 1931, pp. 167, 442-445, and 454-463, 

•See Appendix S, section 24.1, 
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Chart 24.1. It may be seen that the distribution of the sample means is 
nearly symmetrical. Shewhart does not comptite the value of for 
the 1,000 sample means, but for the frequency distribution in class 
intervals of 0.25, shown in Chart 24.1, /8,j has been found to be 0.0027. 


PCK 0 SS CLASS 
INTCKVAL 



Chart 24.1. Distribiilion of Shcwhart’» Normal Population of 998 Items 
and of 1,000 Sample Means for Samples Having N *■■= 4. The clas.s intervals 
were 0.50 for the population and 0.25 foi the sample means. Based on data from 
W. A. Shewhart, Economic ('ontrol of Quality of Manufactured Product ^ D, Van Xostrand 
Co., Inc., New York, 1031, pp. Ih7, 442-415, and 454-463. 


Chart 24.2 shows the distribution of the arithmetic means of 100 
samples of 10 items each and the distribution of the skewed population 
from which the samples were drawn. For the population, /3i^ = 0.09G. 
If all possible samples of N = 10 had been drawn, the skewnc.ss of the 
sample means would have been 




N 


0^096 

lb' 


0.0096. 


For the 100 samples, /3i_f = 0.0031. It is clear that the skewness of the 
sample means is much lcs.s tha.i the skewness in the population. 
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Shewhart^ has drawn samples from a population which is much more 
skewed than that shown in Chart 24.2. His right-triangular universe and 
the distribution of 1 ,000 sample means {N ~ 4) are shown in Chart 24.3. 
The skewness of the right-triangular universe is indicated by = 0.320. 

FKCQUeNClCt 



Chart 24.2, Distrihiition of Skewed Population of 972 Items and of 100 
Sample Means for Samples 1 laving A* =* 10. The population consisted of the 
weekly earningfi of 972 wage r-arnons. C^lass intervals were .S2.50 for both series. 

For samples of 4, we would expect the skewness to be about 






N 


Q .32Q 
* 4 


= 0.080. 


For the distribution of the 1,000 sample means, the skewness has been 
computed to be 0.062. Wliile this value of 0ix larger than those Just 
obtained for the other two sets of samples, it must be remembered, first, 
that the skewness is much less than that of the population and, second, 
that populations as skewed as this are not often encountered. 

Kurtosis of sample niean^^ The kurtosis of a distribution of sample 
means may be expected to be closer to 3.0 (the value for a normal dis- 

* The population data are from page 183 of the reference given in footnote 2, The 
daU of Sfiwplc means were obtained by correspondence from Dr. Walter A. Shewhart 
All skewness and kurtosis values (except those for the aornial population) were com- 
puted by the writers. 
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tribution) than the kurtosis of the population from which the samples 
were taken. The relationship is 


- 3 = 


- 3 


N 


-I or 


- 3 

= -V“+3. 


x\ 


For Shewhari’s normal population, the value of waa 3.0, and the 
distribution of sample means (Chart 24.1) would be expected to have 
/Sjjp = 3.0. For Shewhart’s 1,000 sample means, jSjjj was 2.98. 


FRCQOCNCICS PEA 
01 CtASS INTERVAL 



Chart 24.3. Distribution of Shcwharl’s Right-Triangular Population of 
820 Items and of 1,000 Sample Means for Samples Having iV *» 4. The class 
intervals were 0,1 for the population and 0.2 for the sample means. For source of 
data, see footnote 4. 


Shewhart also constnicted a rectangular population, shown in Chart 
24.4A, which is extremely platykurtic, having 02^ == 1.80. From this 
population he obtained 1^000 sample means (A^ = 4), the distribution of 
which is also given in Chart 24.4A. This curve looks as if it might be 
nearly mesokurtic. The kurtosis of these sample means would be 


^ See footnote 4. 
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expected to be 



- 2.70. 



+ 3, 


For the 1,000 sample means, 02^^ = 2.99. 

Rhewhart did not eonsider a leplokurtic population, l)ut Alfred J. Kana 
designed such a populati(*u of 1,000 items, which is shown in Chart 

FREQUENCIES PER 
01 CLASS interval 
QU 
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Chari 24, 4A. Dislrihution of ShewharCs Rectangular (Platvkiirtic) Popu- 
lation of 122 Items and of 1,000 Sample xMeans for Samples Having IS = 4. 

The class intervals were 0 1 for the population ai.,! O.ii for the sample means. For 
source of data, see footnote 4. 


24.4B. From this population, Kana obtained 400 sample means {N — 
5), the distribution of w}ii(‘h also appi^ars in Chart 24. 4H. The kurtosis 
of the population was = 7.927. Selecting samples of five items each 
could be expected to yield 



ir__ 

'n 


- 3 

-- + 3 



3.985. 


Only 400 samples were drawn, but for this group of samples it was found 
that 1^2^ ~ 4.190, a value much nearer to 3.0 than the value of 02^. 

Sample means and the normal curve. From what has been said, 
it is clear that the distribution of sample means is normal when those 
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means have been computed from random samples from a normal popula- 
tion. If a population is skewed, the skewness present in sample means 
drawn from that population will be much less, the skewness being 
inversely related to the size of the sample as indicated by 


If a population is leptokurtic or platykurtic, the distribution of sample 
means drawn from that population will be more nearly rnesokurtic, as 

fRrOUfNCiCS 



Chari 24.4B. Dislribulion of Kana’s Leptokurlir 
Population of 1,000 I toms an<l of 400 Sample Means 
for Samples Having iV = 5. The class intervals were 
l.O for both series. The kurtosis values, given in the text, 
were computed from urigrouped data for both series. 
Data liom Alfred J. Kana. 


howii by 


/?2 



+ 3 


As a consequence ol these two relationships, statisticians consider 
sample means to be distributed normall}^ unless there is reason to believe 
that the population from which they were taken departs markedly from 
normal. 

Ilispersion of sample means, A glance at any of the four preceding 
charts will reveal that the dispersion of sample means is much less than 
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A i,v> ttO (iraniM an<l a ~~ (y f^raiiis (Broken (iur\c5. 

the (iispersioTi of the ])opulatu)!i from which those sample inoaus eame. 
The relationship 


cr 



For <ho population data of Chart 2 1. 1 . we lia\ cr 
Conbetiuenily, 


(Tx - 


J .0070 
V4 


0.5035. 


1 .0070 and A' - 4. 


For the 1,000 sample means, the standard deviation may he computed 
using the expression 




(Xi -X) * +'"C^a_- ="+ 

i,6ob 


+ (A' 


1,000 




* Sec Appendix S, section 24.2. Note that, as shown in the proof, the expression 
used above not valid unless the population is larpo in relation to N. 
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The value of the standard deviation for the frequency distribution of 
sample means, shown in Chart 24,1, is 0.503, which agrees very closely 
with the value of 0.5035 that would have been obtained if we could have 
considered all possible samples of = 4. 



Chari 24.6. Diwlribulion of Sample Arithmeiie iMoaiiH for Xrt' — 50 
mm and o’ = 8 mm, When j\ =■ 16 (Solid Curve) and When = 64 
(Broken Ciir^e). 


From the expression 

^ Vn 


it is obvious that (1) the greater the dispersion of the population, the 
greater the dispersion of sample means taken from that population; and 
(2) the larger the size of the samples, the smaller the dispersion of sample 
means. These points are illustrated in Chart 24.5, which shows the dis- 
tributions of sample mean.s for two different values of (T when N is 
unchanged, and in Chart 24.G, which shows the distributions of sample 
means for two sample sizes from the same population. 



Chap. 241 ARITHMETIC MEANS 635 

SIGNIFICANCE OF THE DIFFERENCE BETWEEN AND 
WHEN AND cr ARE KNOWN 

A difference between X and X(s> that is not significant. Consider 
the tire-mileage data referred to previously for which X.(y = 15,200 miles 
and O' == 1,248 miles. If random samples of 100 tires are to be drawn, we 



-2.58orjf +2.58<rx 

Chart 24.7. Kxproted Distribution of Sample Arithinetir Afeaiis, from a 
Normal Population, Showing the 0.05 and 0.01 Levels, 


would expect the sample means to have 


<T _ _ 1 , 24 ^ 

Vn \/^ibo 


124.8 miles. 


(Consequently, the sample means would be distributed as shown in Chart 
24.7. In this ehart, particular attention has been called to the deviations 
of ± l.OGcfjt and ±2.5S(r^^, As may be seen from the chart, ± cuts 

off 5 per cent of the area of the curve in the two tails, while ±2.58crjf cuts 
off 1 per cent of the area of the curve in the two tails. These percentages 
may be obtained from the table of areas of the normal curve (Appendix 
E) which we used in the preceding chapter or, more readily, from Appen- 
dix II, which shows areas in two tails of the normal curve. The two 
deviations shown in Chart 24.7 are those which denote, for the normal 
curve, the 0.05 level and the 0.01 level. Significance tests make frequent 
use of the 0.05 and 0.0 1 levels, although other levels - Tor example, 0.001, 
0.005, 0.02, and 0.025 are also employed. 
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One sample of 100 items, allegedly a random sample and supposedly 
drawn from the population mentioned in the preceding paragraph, was 
found to have X -= 15,2(59 miles. We are interested in knowing whether 
it is reasonable to believe that this sample mean is the arithmetic mean 
of a random sample from the population having X(p == 15,200 miles and 
cr = 1,248 miles. The difference between X and is 69 miles. In 
order to be able to refer to the normal curve, we express this difference in 
terms of which has already been ascertained to be 124.8 miles. 



VALUCS OFCTjf 


Chart 24.8. Distrihiilion of Sump!#' Means ami ( 'liances of Ohlaiii- 

in*; .Sample Meunn from Xiv' hy ^ 0.55ajf or More. 

Therefore, 

j: X -- 15/2(59 - 15/200 09 

cr "" ( 7 / " 124.8 121.8 

Referring to Chart 24.8, we may see the area under the normal <‘urve (the 
cross-hatched portion) which is cut off by a deviation ot -bO-ojcr^, From 
Appendix G, which shows areas in one tail of the normal curve, this cross- 
hatched tail is found to include 29 per cent of the area under the curve. 
Since wc know that sample means both exceed and fall below we con- 
sider also the tail of the normal curve cut off by ~-0.55o'x, which is the 
stippled portion in Chart 24.8. This tail, too, includes 29 per cent of the 
area under the curve, and the two tails combined contain 58 per cent 
(P — 0.58) of the area under the curve. From this we conclude that, 
since a difference of ±0.55crjf may occur so frequently through the opera- 
tions of random sampling, there is no adequate basis for thinking that 
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the sample mean was not the mean of a random sample from the popula- 
tion under consideration. 

The foregoing involved setting up the hypothesis that the sample 
mean was the mean of a random sample from the population having 
= 15,200 miles and <j ~ 1.248 miles. This hypothesis is referred to 
as a “null hypothesis,” since it is a hypothersis of no difference between 
X and The next step consisted of testing the hypothesis by com- 

X 

puting a significance ratio and determining the probability of obtaining 

cr 

a deviation equal to or greater than that observed, as a result of random 
sampling. Our test casts iiiuch doubt (if F is small) or little doul)t (if 
P is large) on the hyputhc'sj.'. Since P v.'a^ found to be 0.58. oni hypothe- 
sis was not impugned. 

Note that we did not “prove” the hypothesis. Statistically, a 
hypothesis can never be “proven” or “disproven.” By means of 
repeated experiments wh.ich always yield <*orisistont differences, or lack 
of mein, an investigator might eventually consider* a hypoth(‘sis false or 
valid. Statistical tests, however, can rru-rely cast much or little doubt 
upon a hypothesis, thus discrediting or failing to discredit the hypothesis. 

A dilfc^renee between X and Xp that is significant. Consider 
another sample of 100 tires having X 14,738 miles. To test the 
hypothesis that this mean is the mean of a random sample from the popu- 
lation having - 15,200 miles and o' - 1/248 miles, VrC compute 

X ^ _ 14,738 - 15^00 _ 162 _ 

or "" C7,v 12T8 ^ 121.8^'” ' * 

Referring to Appendi.x H, which shows area**’ in two tails of the jiormal 
(airve, Ave hnd that P 0 000216 This is pictured m Chart 24.9. 
Since a difference such as that observed could lie expected to occur so 
infrequently as a result of random sampling, t.he null hypothesis is not 
tenable. The sample mean may have been the mean of a non-random 
sample from the population under consideration, it may have been the 
mean of a random sample from a different population, or it may have been 
the mean of a non-randoiu sample from a different population. In any 
event, we feel justified in declaring thai it is not- (that is, it is extremely 
unlikely to be) the mean of a random sample from the population having 
-* 15,200 miles and a - 1,248 miles. 

The two tests which we have made were both two-tail (or two-sided) 
tests, since we con>sidered either plus or minus differences as tending to 
discredit the null hypothesis. Sometimes, as we shall see in later portions 
of this text, a positive divergence will tend to discredit a hypothesis, Avhile 
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a negative difference will not; in such a case, we should consider only the 
area in the right tail of the appropriate curve. When a negative differ- 
ence tends to discredit a hypothesis, but a positive difference does not, we 
take cognizance of the area in the left tail of the curve.’ 

The value of P and significance. We have just considered two 
differences, one of which was declared *\significant'’ and one ^*not- 



-3,70<rj +3.70<Tf 


Chart 24.9. Expected Distribution of Sample M<*aii8 and Chances of Obtain- 
ing Sample Means Differing from X(? by or More. 

significant.*^ These examples were purposely selected to illustrate con- 
clusions that would be obvious once P had been determined. How small 
should be the value of P in order for a differeiK^e to be declared significant? 
This is not an ea.sy question to answer,^ since the answer depends largely 
upon the nature of the phenomenon being considered and the conse- 
quences of being wrong. 

For the sample having X - 14,738 miles, w’e found P to bo 0.000216 
and considered the null hypothesis to be discredited. Actually, it is 
passible that the hypothesis was true and our conclusion wrong, since 
random samples would show a deviation equal to or greater than 3.70(7^ 
exactly 216 times in a million. 

^ There are also situations in which wc may wish to make a two-tail test with 
unequal areas in the two tails. Sec, for example, the illustration given in M. G, 
Kendall, The Advanced Theory of Statistics, Vol. 11, Charles Griffin and Company 
Limited, London, 1948 (Second Edition), p. 99. 

* TViis relatively innocent-appearing problem involves very complicated aspects. 
For another non-technical discussion, see L. H. C. Tippett, Technological Aspects of 
Statistics, John Wiley and Sons, New V'ork, >950, pp. 93 -95. A more detailed pre- 
sentation will be found in H. M. Walker and Joseph l..ev, Statistical Inference, Henry 
Holt and Co., New York, 1953, pp. 162-167 (concerning means) and 44-79 (dealing 
with proportH^ns). 
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Type I errors. When a null hypothems is actually true, and when the 
difference under consideration is declared not significant (that is, the 
hypothesis is not impugned), the conc^lusion is correct. When a null 
hypothesis is actually true, but when the difference involved is declared 
significant (that is, the hypothesis isdiscrcdited), wesay thata“Type [ 
error” has been made. If we use P ^ 0.05 as our criterion of significance, 
declaring significant all differences having P ^ 0.05, we shall make 
exactly 1 out of 20 Type I errors in the long run ; if we use P = 0.01 as our 
criterion of significance, declaring significant all differences having P g 
0.01, we will make 1 out of 100 Type I errors in the long run. It must 
be clear that, the lower the value of P which is used as a criterion, the 
fewer Type I errors that will be made. Xhiforturiately, decreasing the 
proportion of Type I errors serves to increase the sort of error described 
in the next paragrapli. 

Type II errors. When a null hypoth6\sis is actually false and when the 
difference under consideration is declared significant, the conclusion is 
correct. When a null hypothesis is actually false, but when the difference 
being i rcamined is declared not significant, we say that a “Type II error” 
has been made. If we us(‘ P = 0.05 as the criterion, \ve cannot say how 
frequently Type II errors will occur, since we cannot know how false the 
hypothesis may be. The sample (or samples) may be a non-random one 
from the population involv'od, or the sample may be a random or non- 
random one from a population other than the one involved. In this 
situation, we (*an merely say that, if we use P = 0.05 as a criterion, we 
should expect to make fewer 'J"ype II errors than if P = 0.01 is employed.® 

« We may, lio^Nevcr, state the probability of Type IT errors if we set up an alterna- 
tive hypotlie.sis. The left eurve in the accompany d’agrarn represents a test 
(usin> 2 : 0.05 in the right tall as tlie criterion) of the hypotle\sis that X is the mean of a 
random sample from a population having Xi}> as its mean, only positive values of 
X -- X(r .serviJig to discredit the hypothesis. 



Any value of X falling between — » and + j would cause us to accept the hypothesis. 
If the true value of X(p ib that shown at the center of the right curve, then the proba- 
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Choice of criterion. For practical purposes, the probability which is to 
serve as the criterion of significance should be chosen in the light of the 
type of error which should be avoided. If Type I errors should be as few 
as possible, P should be very small. If Type II errors should be few, P 
should be larger. Consider the following examples: 

An agricultural experiment station has developed a new hay crop 
which is believed to be superior to existing crops, such as alfalfa, lespedeza, 
clover, and the like. In order for a farmer to raise the new crop, he must 
invest heavily in special machinery for sowing the seed and for harvesting. 
If, in the comparison of the new crop with the present crops, a Type I 
error w^ere made, farmers who planted the new crop would incur heavy 
expenses but would find the new hay to be no better than that formerly 
fed to their stock. As a result, the farmers would have experienced heavy 
losses. If a Type II error were made, the new crop, though better, would 
not be introduced and, while farmers would have fail(?d to gain the advan- 
tages that would have resulted, they would have incurred no actual loss. 
In such a situation as this, P should be very small, say 0.01 or 0.001 , to 
warrant one in declaring the observed difference to be significant. 

Not long ago the United States Food and Drug Administration acted 
against a chemical manufacturing concern, alleging that digitalis sold by 
the firm was half-strength. The difficulty said to ho involved was that 
persons using this digitalis and becoming accu.stomed to it might e.xperi- 
erice serious consequences if they shifted to a fuli-stnuigth digitali.s. In 
the case of a drug such as this, it is important that the day-to-day pro- 
duction be kept in oonformance with the standard (population). As 
tests are, made of each batch, it is essential that no batch should be 
appreciably stronger or weaker than the population. If, in testing a 
batch, a Type I error were made (that is, if the but(‘h is .said to differ 
significantly from the population when it actually does not), the result 
would be that the batch would be dis(*arded or reprocessed. On the 
other hand, if a Type II error were made, we would bn slating that the 
batch did not differ significantly from the populalion when a real differ- 
ence wa.s actually present, and serious harm, even death, might re.sult to 

bility of a Tj'pe U error is reprrseiite<l l/V llip sha<lf*(l arr.a, whu-li is Hl>out 0.20. Other 
alternative hypotlie.ses may also set up. Noti‘ tliat if tlio true Xy is farther to the 
light, the probahility of Type II errors is deereased: if the true Xy is farther to the 
left, the probability of Type II errors is iruTeased. From tlie chart it is also clear 
that, if the black area (representing the j)n)l)abiljty of 7'ype I errors if Xy at the left 
is the true mean) is decrea.sed, the probability of Type II errors (if the true Xy is as 
noted on the chart) is increased; if tlie black area is increased, the .shaded area is 
dccrea.sed. For a further discussion see the second reference mentioned in note 8 
and al.so A. M. Mood, 1 ntrodudioa lo the Theory of Statistirs, McGraw-Hill Book 
Company, New York, 19.50, pp. 245 "207. 
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persons using the drug. In such a situation it is clearly more important 
to avoid Type II errors than Type I errors, and P should therefore be 
fairly large, say O.lO or, preferably, larger. 

There will be frequent occasions when one cannot say whether Type I 
or Type II errors are more serious. If one is testing the difference of the 
mean IQ*s of male cooks and of male dishwashers,^® such a situation 
arises. Here the investigator might be satisfied to use P = 0.05 a-s a 
criterion. 

From the foregoing it shoxild be clear that the same value of P should 
not be used as a criterion for all tests. The appropriate level will depend 
on the circumstances. One should never state that a result is significant 
or not significant without also giving the value of P, which may ordinarily 
be read with sufficient accuracy from existing tables, interpolatioti being 
rarely called for. Alternatively, one maj^ say: significant at the 0.01 
(or other) level.” Sometimes an investigator will say: “Significant at the 
0.05 (or other) level but not significant at the 0.02 (or other) level.” 
Stating the value of P allows thc^ reader to draw his own conclusion con- 
cerning significance. 

Another important consideration is the desirability of deciding, in 
advance of attacking a problem, the criterion of significance that will be 
used. This avoids the possibility that the P value which is obtained may 
influence one in setting his criterion. This is particularly likely to happen 
if one “hopes” for a significant or non-significant difference. 

Probability and everyday occurrences. The reader may feel that 
the conclusions regarding significance and based upon probabilities 
involve a new basis of thinking w^hich he has not encountered before. 
This may be true, in that wx' are using some of the most elementary ideas 
of mathematical probability.^* However, basing decisions upon proba- 
bility of some sort has been an everyday occurrence throughout every- 
one’s life. The student studying for an examination considers the parts 
of the course about which the instruc*tor is likely to ask questions and the 
portions not likely to be covered in the examination. This crude subjec- 
tive sort of probability serves as a guide to him as he review^s. The 
baseball coach must consider the chances (or “play the percentages,” as 
the radio commentators say) befon* he orders a squeeze play or before he 
puts in a right-handed batting pinch hitter, batting at 0.240, to replace a 
left-handed batting regular, batting at 0,290, to face a left-handed pitcher. 
Before one approaches his boss fo. i raise, he usually considers whether 
today, tomorrow, or some other day will likely be most propitious. On 

Differences between two sample means are discussed on pages 651-657. 

n Sec, for example, James O. Smith and Aeheson J. Duncan, Elementary Statistics 
and Applications, McGraw-Hill Book Co.. Inc., New York, 1944, Chapter 10. 
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a much larger scale, unions are not likely to demand wage increases during 
the slackest months of the year or during a depression. Similarly, 
utilities are not apt to ask for rate increases wherf business is in the 
doldrums. 

Size of sample. Occasionally one may wish to know the sample size 
which will give a specified degree of assurance that sample means will fall 
within designated limits. For the data of tire mileage, where — 
15,200 miles and a == 1,248 miles, what sample size wouid result in sample 
means varying within ± 200 miles for 98 out of 100 samples? The answer 
is obtained by substituting in the expression 

cr 


the knowm and designated values and the value of (from Appendix H 

<T 

or the last row of Appendix I) which cuts off two tails which include two 

X 

per cent of the area of the normal curve. Since the - value is 2.326. we 

cr 

have 




200 

Vm 


200 Vn = (2.326) (1,248) = 2,902.8. 
Vn •= 14.5. 

N - 210. 


SIGNIFICANCE OF THE OIFI ERENCE BETWEEN X 
AND WHEN <T IS NOT KNOM N 

The preceding disou.ssion has dealt only with the procedure wdiicli is 
applicable when X.y and cr are known. It is ver}^ unusual for population 
values to be available. This will be obvious if we enumerate the most 
important conditions under which population values may be known. 
They are: 

(1) A complete census may have teen taken. Thus, from the most 
recent United States census X and <r could be computed for ages of all 
persons enumerated. (Note that the rounding tendency, mentioned on 
pages 22-23, would affect the accuracy of these, or any other, age figures 
not based on correctly reported dates of birth.) 

(2) Population values may be known as the result of extensive experi- 
ence. This is the type of situation illustrated by the tire-mileage data. 
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(3) Much like tbe preceding is the setting up of a '^control population'' 
to serve as a standard in quality control. Here, many units are manu- 
factured under carefully controlled conditions, and the statistical values 
computed from these units are treated as population data. Day-to-day 
production figures are then compared with the population data. 

(4) Population values may be known or assumed upon the basis of 
hypothesis or theory. Cases are encountered most frequently when 
dealing with proportions rather than means. In a test to ascertain 

TABLE 24.1 

Breaking Strength of 10 Specimens of 
0A04-Inch Diameter Hard^ilrawn 
Copper Wire 


Specimen 

Breaking strength 
in pounds 

X 

X* 

1 

578 

334,084 

2 

572 

327,184 

3 

570 

324,900 

4 

568 

322,624 

5 

572 

327,184 

6 1 

570 1 

324,900 

7 ; 

570 

324.900 

8 

572 

327,184 

9 

596 

355,216 

10 

584 i 

341,056 

Total 

- Hi 

3,309,232 


Data from Ara«ncan Society for Testing Materials. 
SupplemenU to 1933 A.S,T.M. Manual on Presentation 
of Data, “Supplement A — Presenting Plus and Minus 
Limits of Uncertainty of an Observed Average,” p. 1, 
reprinted from Proceedings of the American Society 
for Testing Materials, Vol. 36, Part 1. Philadetphia, 
1936. 


^ *= “TTT “ 576,2 pounds. 



« \/76.73 8.70 pounds. 


whether tea drinkers could differentiate between tea sweetened with 
sugar and with saccharine, the population proportions might be assumed 
to be 0.50 for each sweetening agent. In a preference test for four 
brands of coffee, the population proportions would be taken as 0.25 for 
each brand. _ __ 

^ A difference between X and X(p that is not significant. Tests 
have been made of the breaking strength of ten pieces of hard-drawn 
copper wire, as shown in Table 24.1. The arithmetic mean of the ten 
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values is 575.2 pounds. With 0.01 as our criterion, let us test the 
hypothesis that X = 575.2 pounds is the mean of a random sample from 
a population having X<p « 577.0 pounds. Now we do not know cr, and, 
since we lack we must make an estimate of a from the data of the 
sample. This estimate is obtained from the expression 


d 


fSx* 


N - 1 N{N - 1) 


. /2/(d')* 


N{N - ]) 

is called an “unbiased” estimate of since*’ 

^? + ^^ + • • • + n 


for ungrouped data, 


for grouped data. 


K 


— tr 


s’ is not an unbiased estimate of since 

Now that we have &, we are in a position to make an estimate of Of. 
This is*’ 

For the data of breaking strength of copper wire, the computation of & 

The basic expression for & is developed in Appendix S, scclion 24.3. The forms 
for ungrouped and for grouped data are obtained from this basic expression by the 
same procedure as that given in Appendix S, section 10.2. 

** See Appendix S, section 24.3. 

** If a is known for a sample, it may be converted into & by use of 



However, such a conversion is not necessary, since we can write 

— ■--y^===== • 

Vn - 1 

It must be clear that, as N increases, the numerical difference between a and a becomes 
of negligible importance. Nevertheless, it is incorrect to use a as an estimate of <r. 
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is shown below Table 24.1, and 
^ 8.70 

dx = ^ 2.75 pounds. 

vio 

We may now compute the significance ratio 

This significance ratio differs from those previously used because the 
denominator is an estimate of Because of this substitution, we are 
no longer in a position to refer to the normal curve, but must make use 
of the t distribution, which, though symmetrical, is more widely dispersed 
than is the normal curve. This may be seen in Chart 24. 10. The spread 
of the t distribution depends upon the number of “degrees of freedom” 
(n) present, the dispersion being greatest for n = 1 and decreasing as n 
increases. As n approaches infinity, the t distribution approaches the 
normal distribution as a limit. This tendency is apparent from a look 
at Chart 24.10. For significance tests involving a single sample mean, 
such as the one under consideration, n - ~ 1 because we used the 

deviations of N values about their own mean in order to compute d. In 
other words, we employed, not Ny but N — I independent deviations. 
For the data of breaking strength of copper wire, 

X - 575.2 - 577.0 1.8 ^ 

2.75 2.75 

The value of P is ascertained by referring to Appendix I forn = A' 1 == 
10 — 1 = 9 and i = 0.65. This appendix table is somewhat different 
from the preceding table of the normal curve. Both tables show areas 
in two tails of the respective distributions, but Appendix H shows values 

X 

of P for selected values of while Appendix I snows values of t for speci- 
al 

fied values of n and P. P^om Appendix I it is seen that 0.50 < P < 0.60, 
and we conclude that there is no significant difference between X and X(p. 
Chart 24.11, which shows a i distribution for 9 degrees of freedom, 
illustrates what has been done. _ 

A difference between X and X(p that is significant. Norman C. 
Wiley*® gives data of tests of strength of three-inch manila rope, showing, 
for one sample, A = 16, X = 9,959 pounds, and s == 248 pounds. Using 

^ The sample data are from Statistical Methods as an Aid in Revising Specifications, 
by N, C. Wiley, a preprint of a paper delivered at the forty-first annual meeting of 
the American Society for Testing Materials. 



rsssn **" 
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Chart 24.10. Comparison of the t Distribution for n == 2, n = 5, and 
n » 20 >vith the Normal Distribution. The values of t, shown above, are - Values 
for the normal curve. The ordinates of the i distribution are obtained from the 
expression 

iH')' 




This gives a maximum ordinate which approaches 1.0 as n approaches infinity, and 
thus is comparable to the expression 


-X* 


C-r)' 

for the normal curve. The computation of 7^"Zr^ ^ clarified by an illustra- 

\'~2^ / 

tion. If n =» 11, the numerator is 5», while the denominator is 4.5!. The value of 
4.51 is given by 4.5 X 3.6 X 2.5 X 1.5 X 0.6 X Vir. 

646 
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the 0.01 level as a criterion, we shall test the hypothesis that .5? = 9,959 
pounds is the mean of a random sample from a population having 
X(p = 10,148 pounds. In order to obtain we make use of the expres- 
sion given in footnote 13, 

Then we compute 


From the I table of Appendix I, it appears that P is almost exactly 0.01, 
and we reject the hypothe.sis. The foregoing is shown graphically in 


^ 


v7V-i V^15 3.873 


K - 9,959 - 10, H8 

189 


64.03 


- 2.95. 



-5 -4 -3 -2 -I O tl 4 2 ♦» +4 45 

VALoCS OF t 

Chart 24-11. The t I)is»lrihulion for n =» 9, Sho^ving Prohahilily of Obtain- 
ing t — ±0.65 or more, 0.50 and 0.60 of the area under the curve is in the 

two taiia. 

Chart 21.12. Note thtit, if Ave bad used the normal table of Appendix H, 
the probability would have been misleadingly small, about 0.003! The 
dilTerence in the two probabilities w ould liave been much less if the sample 
had been larger, As may be seen in Chart 24.10 and in Appendix I, the 
t distribution seems to begin to approximate the normal distribution at 
about 71 — 20. Some statisticians customarily refer to the normal table 
when 71 § 30, but this seems to have been due to the fact that, for some 
time, the available t tables gave no values of i between ti = 30 and 
n = 00 . Appendix I lists i values for n = 30, 40, 60, 120, and qq . It is 
best to use the t table in all cases where & has been used as an estimate 
of or. 
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Confidence limits oi' X(p. In the illustration just given, it was con* 
eluded that the sample mean was not the mean of a random sample from 
a population having X(p = 10,148 pounds. From a knowledge of the 
sample alone, what can be said about the limits within which ^(j) may be 
expected to occur? We want two values for ^(p, which we shall call 
and and which will be, respectively, smaller than and larger than 
X. These are the ^‘confidence limits'* of X(p. The first step consists of 
deciding how often we are willing to be wTong in our statement of con- 
fidence limits. Suppose that we can allow ourselves to be wrong not 



Chart 24.12. The £ Distribution for n == 15, Showing Probability of Obtain- 
ing t «* ±2,95 or More. Almost exactly 0.01 of the area under the curve is in the 
two tails. 


more than 5 times in 100, In that case, we want the 95 per cent con- 
fidence limits. These limits are obtained by determining: 

(1) the value of s6 located that X cuts off the upper 2i per cent 
tail of the distribution of sample means around X(p„ and 

(2) the value of so located that X cuts off the lower 2^ per cent 
tail of the distribution of sample means around 

Both of these values may be had from the following expression, in 
which we substitute the already computed values of X and the 

t value for the appropriate confidence limits: 

X = X(f ± t&ji. 

Since we want the 95 per cent confidence limits, and since n =* 15, the 
value of t (from Appendix I) is 2.131. We have, then 

9,959 = X(p ± (2. 131) (64.03). 

X^ = 9,959 ± 136.4, 

= 9,822.6 and 10,095.4 pounds. 

The foregoing procedure is illustrated in Chart 24.13. 
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We are not sure that the population mean falls within the limits just 
given, but we are 95 per cent confident that it does so. In other words, 
if many determinations of 95 per cent confidence limits are made, we can 
expect those limits to include the population value 95 times out of 100 
and to exclude the population value 5 times in 100. Roger P. Doyle 
computed the 95 per cent confidence limits of for each of Shewhart's 
1,000 samples from a normal population. Using X, and u == 3 for each 
sample, he ascertained 1,000 pairs of confidence limits and noted, for each 
pair, whether they did or did not include X(j> == 0. His confidence limits 
were right in 951 instances, wrong in 49. 

While the preceding illustration obtained 95 per cent confidence limits, 
any desired liinits may be computed, by merely substituting the appro- 
priate t value, together with the values of X and obtained from the 
sample. Limits such as 99.9, 99.8, 99, 98, 9G, 95, and 90 are often used. 
Confidence limits representing loss than 90 per cent confidence are not 
often wanted, since they do not express a very high degree of confidence. 

The determination of confidence limits for proportions, sample vari- 
ances (s^ or and correlation coefficients will be discussed in tlie two 
following chapters. For these measures, as well as for arithmetic means, 
the statistical worker should carefully consid(ir tlie ina.ximum and mini- 
mum possible values for the measure in question. Occasionally, the very 
nature of the variable sets limits, beyond which values cannot occur, and 
which should take precedence over computed coufukuicc limits. 

The expression for determining the confidence limits of A^v was written 


, ^ ± l&x, 

rather than 

X(p = X 

which wrould have given the same results. The purpose of doing this was 
to stress the fact that sample means are distributed around X^^ (Jhai t 
24.13 also attempts to make this clear. There is no such thing as a dis- 
tribution of population means around X. 

The illustrations given on the preceding 7 pages all involved and 
the t distribution. It may be well to stress the point that variations in 
the value of t occur because of sampling variations of & as well as because 
of sampling variations of X. A large value of I (and therefore a small 
P value) may result from the fact that X differs greatly from X(p, or 
because & is smaller than (t, or both. A small value of t (and therefore a 
large P value) may occur because X closely approximates or because 
& exceeds <r, or both. When cr is known, the only sampling variations 
present are those of X, 
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SIGNIFICANCE OF THE DIFFERENCE BETY7EEN TWO 

SAMPLE MEANS 

Independent samples. From archaeological excavations con- 
ducted at a certain site, 16 lower first molars were recovered.** We do 
not have the measurements of each of the 16 teeth, but we know that 
= 13.57 millimeters and si =•-- 0.72 millimeters. From a nearby site, 
9 lower first molars were taken with .Yj = 13.06 and a* = 0.62 milli- 
meters. Using P = 0.05 as a criterion, is there a significant difference 
in the mean length of these two groups of lower first molars? To make 
this test, we set up the null hypothesis that the two sample means are 
from the same population in regard to and we test this hypothesis by 
determining the probability of where I is the ratio of — .^^2 to an 
estimate of the standard error of the difference between the two sample 
means. 

As shown in Appendix S, section 24.4, the standard error of the differ- 
ence between two sample means is given by 

+ (tI„ 

provided that the two samples are independent. Non-independent sam- 
ples are considered later in this chapter. The expression just giv'^en may 
be written 



We cannot make use of this formula for our problem, since Ave do not 
know the value of or. (If we knew o’, we would almost certainly know 
X(p as well, since cr is computed around If we knew it would be 
more meaningful to compare Xi and X 2 with than to compare the two 
sample means with each other.) Consequently, we make an estimate of 

Based upon illustrative figures used in a lecture by Professor Egon Pearsoa at 
Columbia University. 

The assumption is made that the two samples are from the same population in 
regard to variance, o’*. This assumption is not unreasonable for our problem, since 
an F test, described in Chapter 2G, reveals that there is not a significant difference 
between and When two samples are believed to be from populations of unequal 
variance, and when Ni — or when Ni — N 2 and both are large, an approximate 
test may be made by using 



For a discussion of procedures when the population variances are unequal, see Maurice 
G. Kendall, The Advanced Theory of Statistics, Charles Griffin and Co.^ Ltd., Ijondon, 
1948, Vol. II, pp. 111-114. 



652 


STATISTICAL SIGNIFICANCE I 


[Chap. 24 


the value of <r, from the information given by the two samples. This 
estimate'* is 


I Sx? 
- i 


+ 

+ JV, - l‘ 


When the individual observations are available for each sample, as is 
usually the case, we may compute 


2x» = SA* - 


2x2 =: i2 2/(d')2 - 


for ungrouped data, or 


for grouped data. 


For the problem at hand, we do not have the individual observations, but 
we do have «i and S 2 . Since 


\ AT.' 


2xJ = Nisl and 2xj = Ntsl. 


We therefore compute 


2xJ = 16(0.72)2 = 8.29; 
2x^ = 9(0.62)2 = 3.46. 


The estimated value of a is then obtained. 


r 8.29 

~ Vl6 - 1 


+ 3.46 


= 0.715. 


>16-1 + 9-1 

The estimated j^tandard error of the difference between the two means- 
may now be computed : 

= 0.715 V* + i - 0.298. 

is a weighted average of the two d* values for the separate samples. See 
Appendix S, section 24.5, Section 24,6 shows that when Ni *= -iVa, 


A /L+±= + 

yN, Ni 


When more than two samples are involved, the estimate of a* is given by 
-f Xx\ -h -f . . . 

Ni ~ r+Ni - 1 + iV, - 1 -f • • •' 

We shall make use of this expreasioa in connection with the discussion of analysis of 
variance in Chapter 26. 
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Finally we may obtain the desired significance ratio, 

" 0.298 0.298 “ ' ‘ 

From the first set of data, we have ni ^ — 1 — 10 — 1 =15 degrees 

of freedom ; from the second set, n 2 = .V 2 “ 1 =9 — 1 = 8. Therefore, 
n = Til + - 23* Note that one degree of freedom was lost when 

SxJ was computed about *^1 and another degree was lost when Sxj was 
computed about ^ 2 - From the I table of Appendix 1, we find P ^ 0.10, 
and we consider the difference betw'een Xi and X 2 not significant. Chart 
24.14 illustrates the foregoing. 



Chart 24.14. The t Oisirilnition for n = 2.1, Showing Probability of 
Obtaining t ~ ±1.71 or More. Approximately 0.10 of the area under the curve 
is in the I,. o tails. 


Confidence limits of X^p, — X^p,. Occasionally, when it has been 
concluded that a significant difference exists between Xi and X 2 , it may 
be desirable to have a statement of the confidence limits of ~ Xiy,, 
This is obtained by solving the expressions^ 

X I — X 2 ^ (X(pj -- X(pJ i 

for ~ X(y,. As in the determination of confidence limits for X(p, 
the value of t is read from Appendix I and depends upon (1) the level 
of confidence to be used and (2) the degrees of freedom, which are 
71 = Ni — 1 ”t“ N 2 — T 

To illustrate the use of the expression given above, consider the yield 
point of structural steel (for ships) obtained from two sources. For 

As in testing the significance of the difference between ^ 1 and X 2 , it is as^med 
that the two samples are from the same population in regard to o’*. 
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source 1: Ni « 10, = 45,948 pounds per square inch, and si =» 2,910 

pounds per square inch. For source 2: JV ^2 — 19, a = 39,820 pounds per 
square inch, and $2 = 2,510 pounds per square inch. Employing the 
same expressions just used for the data of lower first molars, it is found 
that^Xj-jf, == 1,074.9 and 


t 


~ ^2 

6,128 
1,074,9 “■ 


^ 45,9^ - 39,820 
i“074T9 

5,7. 


This value of Hor n -= ni + na =* 9 + 18 = 27 is far beyond the 0,001 
level, so the difference between the means is significant. 

To obtain the 98 per cent confidence limits of we use 

i * 2.473 and substitute the known values in 

,^1 — ^(Pt) ± t&Si-lv 

This gives 

45,948 - 39,820 - - X^,) ± (2.473) (1,074,9). 

~ - 6,128 ± 2,658, 

= 3,470 and 8,786 pounds per square inch. 

Non-independent samples. When inherent pairing exists between 
the pairs of items in two samples, it usually follows that the two samples 
are not independent. We are not concerned if the first, and succeeding, 
pairs of values in the two samples just happen to be paired because they 
were selected in the order listed ; we are concerned if, for example, the 
paired readings are values of IQ’s of brothers and sisters or of twin^, or 
if the values are mileages of tire.s on original treads and after recapping. 
By far the greatest majority of problems which will be encountered will 
<deal with independent samples. However, it is extremely important 
that non-independent samples be recognized as such; they must not be 
treated as independent samples. 

The data of Table 24.2 show the percentage of solids in the shaded and 
exposed halves of 25 grapefruit. Here, it is obvious that the two sets of 
data are not independent; they are inherently paired. The shaded side 
of grapefruit Number 1 had 8.59 per cent solids while the exposed side 
of the same gra^jefruit had 8.49 per cent solids. These two figures are 
inherently paired with each other, because they refer to the same indi- 
vidual fruit. The same is true of the figures for the other 24 grapefruit. 


The data are from the source given in footnote 15. 
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TABLE 24.2 

Percentage of Solids in the Shaded and Exposed Halves 
of 25 Grapefruit 


Fruit 

Shaded 

X, 

Exposed 

D ^ Xl - X, 

74* 


8 59 

8~49 “ 

0.10 

0.0100 

2 

8 59 

8 59 



d 

8 09 

7.84 

0 25 

0.0625 

4 

8 54 

7 89 

0.65 

0.4225 

5 

8 09 

8 19 

-0.10 

0.0100 

6 

8 ^9 

7.84 

0 65 

0 4225 

7 

7.89 

7.89 



8 

8.59 

7 89 

0.70 

0.4900 

9 

8 54 

7 79 

0 75 

0,5625 

10 

7 . 99 

7 84 

0.15 

0 0225 

u 

7.89 

7.79 

0 10 

0 0100 

12 

8 09 

7 84 

0.25 

0.0625 

i;i 

7 89 

7.89 ! 


i 

14 

8 54 

1 8 07 

0 47 : 

: 0.2209 

15 

7 84 

j 7 97 

-0 13 

1 0.0169 

16 

7 49 

7 57 

-0 08 1 

0 0064 

17 

7 89 

7.92 

'-0.03 

0 0009 

18 

7 79 

7 97 

-0 18 

0 0324 

19 

7 84 

8.17 

-0.33 

0 1089 

20 

8 89 

8 67 

0 22 

0 0484 

21 

8 54 

8 07 

0 47 

0 2209 

22 ; 

8 04 

7.97 

0.07 

0.0049 

25 

8.59 

j 8.62 

-0.03 

0.0009 

21 1 

8 19 

7 92 

0 27 

0 0729 

25 

8 59 

7.97 

0.62 

0.3S44 

Total 

205 50 ” 

200.66 ’’ 

4 84 

3 1938 


Dftta from Paul L. Harding, Plant PhyaioJogist. Diviaion of Fruit and 
Vegetable Crops and Diweasea, Bureau of Plant Industry, Sod* and Agri- 
eultural Engineering, Agrirultural Reaearob Administration, United States 
Department of Agricwltme. 

i:/) 4.81 

Xl) - , ~ = - - = 0.194 per cent. 

N 25 

. ^ J v7p" '^"~r2:oy»~" __ (4.84) » 

= V'^O. 133075 -"aOMOia = Vo^oio^. 

= 0.307 per cent. 

, 0.307 „ „ 

■ v’iv " Vi ■ 

In order to test the significance of the difference b.etween the means for 
shaded and exposed halves, we tain the difference D between each pair 
of values, determine the value of Xdj and ascertain whether Xd differs 
significantly from 0. The null hypothesis is that Xd is the mean of a 
random sample from a population of differences having a mean of zero. 
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Below Table 24.2 the computations are shown which give 

Xd = 0.194 per cent, 

&D — 0.307 per cent, and 
&to — 0.061 per cent. 


We then determine the value of t, 


- 0 


0.194 - 0 
0.061 


3.18. 


Since there are 24 independent D values, n = 24, and reference to Appen- 
dix I shows that P is between 0.01 and O.OOl. 

It is very important that the lack of independence between the two 
samples be recognized in such a problem as this. Had we followed the 
usual procedure, which assumes the samples to be independent, com- 
puting Xi — 8.22 per cent, Xt = 8.03 per cent, and — 0.092 per 

cent, we Avould have obtained 


8.22 - 8.0 3 _ 

0.092 ~ 0.092 


2.07, 


which, for n = 48, has 0.025 < P < 0.05. This probability differs 
greatly from that found first. In fact, if one were using the 0.02 or 
0.01 level as a criterion of significance, the method assuming inde- 
pendence of the two samples would have led him erroneously to conclude 
“not significant.” 

The possible consequences of employing the method which assumes 
independence of the two samples when they are not, in fact, independent 
may be clarified by writing >^3 alternative forrn,^.* 

+ ^ 1 . ~ 2,r&x,&x„ 

when r is the correlation between the two samples. If the shorter form, 

^x,-x, = + ^it> 

which assumes independence, is used, the value of ^x,-it will be too large 
when there is positive correlation between the two sets of data and too 
small when negative correlation is present. Ignoring the lack of inde- 

The two forms are exact equivalents, but the expression involving r requires 
much more computation. For the grapefruit data, using r •• -fO.677, “ 0.061, 

which agrees with the value for dip. 
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pendcnce may cause us to fail to declare a significant dififerenoo ^vhen r 
is positive and to erroneously declare a difi'erence to be significant wIumi 
r is negative. In most problems involving inherent pairing, the correhi' 
tion will be positive, but occasional cases occur in vvhicli the correlation 
is negative. In any event, when inherent pairing occ-urs, correlalion 
between the two series is also almost certaiti to be [)resonfc. The chance 
correlation that may appear between two scries having Ni = mui 
known to be independent is of no concern to us. 

CONCLUSION 

This chapter has made no attempt to conrrnsL Inrgc-nurnlx'r irK'tluxis^’ 
and ‘‘small-number met, hods. 'The reas(»n is that wlien a is knowrn the 
normal curve is appropriate for sarujyirs of any .size, huge or .small. When 
(T is not knowm, and \vlien o' is ^anployed hi its place, the / (li.strihntion (a 
“small-number method'’) is always the pr<>[>er ro use. As 

n increases, the t disiribiition api>roaehes the norma! curve, so that for 
large samples tlu' muanal distribution is sometinu’s applied. However, 
even when n is large, the iionnai curve is an approximaTiaai, (Sometimes, 
when a sampK' IkS large, .s' ratlier than a i.s* us<>ii as an estimate' of (X, The 
numeih’al dif'fcronce between arid a is slight for ku'ge samples, but the 
use of s as an estimati^ of a sliouid h<‘ avoiiicd. 

Since the methods discussed in this chapter are jusi as applicable to 
small samples as to huge sample.s, the (piestiou may ari^u wliy bother 
to use larg(' samples? The answer i.s that, when one makes ns(' of large 
samples, a smaller observed difi’erence - A.v or Xj A ? is nece^^saiy 
to obtain significance at a specified probability level. Tin-’ is inie, (I) 
because (or o\v) and \2 lend to d(‘creu.se with an incrixise in sjimple 
size, while A' ~ ^od Aj -- AA do not have a rornvponding tc.ndency 
to decrease, since they may either increase or di'CA'^ase, also, (2) because 
the t vahie re(|\iir(‘d for the sjiecitied probahility level decreas(\s as a 
inereases. Occasionally, as a re.sult of using email samples, oiU‘ may come 
to the coiK'lusion that an obscu’ved ditiereiice is not signiticant, when, if 
large samples had been used, the difference (which itself would probably 
change) might have been significant. 

The tests discussed in this chapter UTuhn’took to ascertain whether 
statistical dif’ferences wawe or were not present. It is worth while to note 
that generic differences, as opposetl to stali.^iieal difl’erences, may exist, 
and that, when a generic difference is present, a statistical difference may 
or may not also be present. A gen vie ditference is an actual dilTcrcuiec 
in kind and may, for ('xample, refer to males and females, railroad ties of 
different kinds of wood or preserved by different processes, or roofing 
nails made of copper or galvanized steel. The tests of yield points of 
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structural steel, referred to earlier in this chapter, arc an illustration of a 
case where a generic difference and a statistical difference were both 
present; the steel from Source 1 was lighter-weight material than was the 
steel from Source 2. If tests were to be made of the reaction times of a 
group of rabbits and a group of guinea pigs, it Ls quite passible that a 
statistically significant difference in reaction limes might not be present 
although the two groups are generically different. 
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Part 1 : Proportions 

a: number of occurrences in a sample. 

Ci: number of occurrences in sample 1. 
a2: number of occurrences in sample 2. 

a: lower-case Greek alpha; number of occurrences in a population. 

A: indicating an occurrence; ^4 has no numerical value. 
b: number of non-occurrences in a sample. 

/?: lower-case Greek beta; number of noii-occurrences in a population. 

B: indicating a non-occurrence; B has no numerical value. 
k: number of samples. 

N: the number of items in a sample. 
iV^i: number of items in sample 1. 

Ni', number of items in sample 2. 
p: proportion of occurrences in a sample. 

Pk: proportion of occurrences in the .v’th sampie. 

Pi*‘ ^^*'tion of occurrcJices in sample 1, 

P2: proportion of occurrences in sample 2. 

p: an (\stiinate of tt based on two samples; a weighted average of /n ami pi. 
P: probability; varies from 0 to 1. 

tt: lower-case Greek pi; proportion of occurrences in a population. 

TTii the lower confidence limit of tt. 

TTi: the upper confidence limit of tt. 

q\ proportion of riomocc’urrences in a sample. 7 = 1 — p. 

(/ii proportioji of non-o' currences in sample 1. 
t ^2- proportion of noii-0('currenecs in sample 2. 
q: I - p. 

(Ta’, the standaid error of a. 

Cp\ the standard error of p. 

estimated standard error of the difference between pi and po. 
t: lower-case Greek tau; proportion of non-occurrences in a population. 
T ^ 1 — TT. 

X P — TT 

‘ : a deviation divided by its standard error; for example, and 

a ' cTp 

a — TT/V 

(Ta 

Part 2: The ri>i-Square Test 

^ a: number of occurrences in a sample. 

aj: number of observed frequencies in the upper left cell of a 2 X 2 table 
or, in general, in any 2 X R table. 
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a 2 : number of observed frequencies in the second row of the first column 
of a 2 X li table; in the lower left cell of n 2 X 2 table. 

Uai number of observed fre(|uencies in the third row of the first column of 
a 2 X table. 

: indicating an occurrence; has no numerical value. 
b: number of non-occurrencovs in a sample. 

bi‘. number of observed frequencies in the uppeu* right (’el! of a 2 X 2 table 
or, in general, in any 2 X R (able. 

h 2 ‘- number of obs(‘rved frequenci('s in the second row of the sei'ond 
column of a 2 X R table; in the lower right cell of a 2 X 2 table. 

63 : number of jiisiu'veti freiiueneies in tlie tliird row of tfie .s<n'otu 1 coluinu 
of a 2 X R tai)!e- 

B: indicating a nc»n-occurrouce: B \\i\i\ no minierical value. 

C: nuiiiher of coluniris of ol).s('rv'ed fn’cpiencies (t*\( lusivai of totals) in a 
chi'S^iuare tabh’ whii’h ha.s its marginal totals s(U. 

/: an observed fre(|U<mcy. 
fr: a (‘om puled frequeney. 
ri: degn'es of freedom, 

N: number of it(‘nis in a .-ample For 2 X 2 and larger tables, A" is the 
number of itiuns in the entire table. 

Na‘- nunib(‘r of fre(|ih‘nci('s (items' m the first cf)jiunn of a 2 X R table. 
Mi,: number of freqiiencie.-^ (iteins) in the .second column of a 2 X R tabh*. 

(V 2 , -V,;, • • ‘ ‘ resjiect ivetv, tiumber of frequenci(\s (items) in the 
first, s(?(‘on(i. tliird, • ’ • row (;f a 2 X R table. 
p: proportion of occ'urrences in a sample. 

Pi: proportion of occum‘jic«‘s in >ainple I 
pn: propoptiou of occurriau'e.-. in sainph' 2 , 

P: probability, varies from 0 to 1 

tt: lowm’-caso ( ir(*elv pi; iiroporiiori of occurr(’iic(v iu a population. 

R: nurnbeu of row ^ of oljserved p’\'‘liHivo of totals) in a ehi- 

.S(|uare table which has d.-. marginal tulal-, set. 

0 *^: the variaui’C of a jjopulation. 

the e.stimatfid variauee of a population. 

(Ta' the standard error of a. 

(Tp: the standard (’rror of p. 

S: upper-case Greek .^igina: meaning 'M.ake the sum of.” 
a deviation divided bv its standard error, for (example, 

(T ' (Xp 

X^: chi-square. The symbol is a lower-case Greek clii. 

!: factorial. For example, 4! = 1 X 2 X 3 X 4. 



CHAPTER 25 


Slatislical Sigiiificaiu’c II; 
Proportiims and llie Chi-Square 

d'esl 


Ih (.his chaptiH’ sfiall ^ for doaSirijz pro- 
portions ironi ratidoin s;aijq)ii'‘> : -io-li al.^^o .‘iliention to rrrlain 

aspects oi tijc f'hi -scpnin* fesi. 'l'li( :■» fo? . (inituKioa; the.so t'lVo topics 
in o]i(* chapun Ik's ir, (/fpj \ (rst and 'iio, .;-P})ro\iiiial.e tests 

for proportio/i'- aitcj'iiaOr, /a n'ollo'.i-. o: arriviim at. idonMeai f*o)i- 

clusiutuc vvili he clarifiod in iho oOtN.aivi paa od (‘hapter. 

V\KV 1: PKOIMUrnoXS 

d’he toUowmp; discussion of pr(»p(iiinM;s (dMaitc^d troin tatidum samples 
will deal, first, Avith tite .si^^iiirioain'#* ol tin d{rier''nct^ htUwonn a sample 
proportion (p) and tlie pruportn.n in < fa* iJOpui.'Uom (tt) when the propor- 
tion in the population i.s Known; s '• oiui, svjlli the rtadidtaiee of tt 

wh(U} oiily p arid A" are known; and. Iina;lv wiC the -ipuiriraiac oi the 
dif'l’erenee (adwt'ca flie p: nport lon.e oi two random samples i^/u and 

Si;LCiiifi«aifUo of the lljiTrreuee lh't’»veen n and tt 

The exarl le>t,7r -- O.nO. l.i a iar^i a>.'^^;L » tu* iii rnarhle.^. half arc 
black and half are white. I’he maf'Oie.N do mu thlft r lYom earh otlier in 
any respect except, eolijr. ( 'oe.siderine, a hdack m.ii'hle a.*^ an '' (jeenrre’u'e 
ami a while marble as a ‘dion-oecuriiMiee l^n ieu:-oecnrrence of 
hhiek), and usinK tt to indicate the prv>porl;n n* ot lion oevUirrerices in the 
population and r the proportion of occurrenee^., Wi liave rr -- l).50 and 


‘ tlw nunitjrr of <HMairr(*ret_‘s (a » unt; tlio rojinl 

A 3 

ii population are known, t - “~r’ j *md r - -- - 

a o « + d 

TT d“ r « 1.0 and r =« 1 — tt. 


rj of iion-uO(Uirri*nrPs {6) in 
i'roiu thoiO il is clear that 
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r — 0.50. Suppose that a sample of 10 marbles is presented, which has 
9 black marbles. We have then: number of occurrences, a = 9; number 
of non-occurrences, 6=1; proportion of occurrences, p = 0.90; propor- 
tion of non-occurrences, q = 0.10. Note that 

a a 

^ “ 7+''b ~ N' 

h _h 
^ ~ a~+ b “ N’ 

p + 7 = 1.0. 

Using P = 0.05 as a criterion, lot ns test the hypotlu'.sts that the sample is 
a random one from the population having tt = 0.50. 

Samples of N — 10 can have a = 0, 1, 2, • • • , 10 and tt - 0, 0,1, 
0.2, • ■ • , 1.0, according to the expression 

(t 5 +7r.4)‘«, 

where .4 and B, which have no numerical value, arc u.sed to indicate, 
respectively, an occurrence and a non-occurrcncc. Since tt — 0.50 and 
T = 0.50, 

{tB + TTd)"' = (0.50B + 0.50.4)'®, 

- (0.50B)'® + 10(0.50)?) ®(0.50 .4) 

+ 45(0.50i?)'*(0.50.4)= + 120(0.r)0/?)’(0..50.4)'’ 

+ 210(0.50B)®(0.50.4)^ + 252(0.50/?j^(0.50.1)® 

+ 210(0.505)«(0.50.4)« + 120(0.50/?)\0.50.4)^ 

, + 45(0.50C)=(0.50/l)" + 10(0.50/?)(0.50.1)® 

+ (0.50.4)"*. 

Performing the indicated computation;; and placing the results in colum- 
nar form gives: 

\ urnbrr of occurrences Propitrlion of occur reners 


of hlark balls 
a 

of black balls 

P 

}*}obnhilily 

0 

0 

O.OOlO 

J 

0 ] 

0 0008 

2 

0 2 

0 0130 


0 

0 1172 

4 

0 4 

0 2051 

5 

0 5 

0 2401 

f) 

0 () 

0 2051 

7 

0 7 

0 3 172 

8 

0 S 

0 0180 


0 *♦ 

0 0008 

10 

1 0 

O.OOlO 

i7oo6o 



Chap. 25) 


PROPORTIONS 


663 


From the foregoing, it appears that the probability of obtaining random 
samples having 9 or 10 blaek marbles is 0.0098 + 0.0010 = 0.0108. This 
is represented by the hvo bars at the extreme right in Chart 25.1. Since 
we have no reason to tx'lievo that the samples would always contain a 
larger proportion of b]ar*k marbles than did the population, we consider 
likewise the probability of one or no black balls, which is also 0.0108 and 
which is represented by the two bars at the extreme left in Chart 25.1, 

PROeABiLITY 



Churl 2.5.1- ProlKihililv of <)ri*urr«‘iicc of Valiirv^ of a anU p in Suniples of 
10 Wlu ‘11 TT ■” 0..50. ( from tlio (Apai)SiOO of (0 50/? -j- = 0 OOIO/P^ 

1- l).0()nS/?M -b 0 0130/?** \ ^ -I- i)Ai72B\'V -f 0:2i):AB\'V ' 0.21()l/?^4*‘ + 0.20.01/ibt ® 
f- 0 1172 i- (». 01 :JU/)'". 1 « -{- 0.(H)n8/>M'^ -j- o.onio.l'^ 

The probability of 9 or more and 1 or fe^\e^ bla^’k marbles is therefore 
0.02](). losing the criterion of 0.05, we reject, the hypothesis that the 
sample was a random one from the population having tt == 0.50. Remem- 
ber that, on the basi.s of this criterion, we would make Ty})e I errors in 
5 per cent of our coruRisiojis. 

If we had been using 0.01 as our criterion, we would not have rejected 
our hypothesis. Had we been employing 0.01 as our criterion and had 
we been concerned with samples h.^'dng ten (or no) black balls, the proba- 
bility would have been 0.0020 and we would have rejected the hypothesis. 

An approximate test, tt = 0.50. It has already been pointed out 
(pages 591-594) that the normal curve is the limit of the binomial as tlie 
exponent of the binomial approaches infinity. For practical purposes, 
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the normal curve is often consitlercd to be a reasonably good description 
of tlie binomial 

(0.50/^ -b 0.r)0d)\ 

when A' ^ 20. C'hart 25.2 shows a normal curve fitted to (0.50/? + 
0.50.1 As we shall later, the apparently good description of the 
binomial by the normal curve, is no guarantiee tliat tin* f)r(K'edure involving 
th(* list' of tiie normal curxa^ amU lead us to tin* sanu* contusion as the 
binomial. 


PROBABILITY 



( hart 25.2, .Nor mill i ur>c riticii !♦» il).5a/i 0.50.1) *". 


If the norma! eurve can be subsliluted for tl*e tjim.mi.al, we may ( om- 
pute the standard deviat ion of a .^ainph* pm centage o ascei tain tin* vahii' 
of 

.r ]) n 
cr (Tj, 

and jiroeeed as in (diapier 21 for ie.^tmg X - A’ v when cr is known. If we 
had a large nu nd^.r of sainj)le' |>r(>po} tion.s (yn. />>. />{, • • ^ pf), all from 

random .sajnf)l('.s from the same popnhition, w<* could compute tln^ stand- 
ard deviation of those proport i^)n‘^ from 

i(pi ~~ TT)- 'f (/;> - wr I f (pk ry 

^ 
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Xt is very unusual to have a larf>T number of sueh p values, b\it it can be 
shown^ that, when tt is known, the standard error of p frorn random 
samples is 




/tt/ 


AltiTuative forms whirh are ^omelinu^s u<{‘ful are 

'Vi I - - tt) ,K — “TT- 


J.et .^s se(‘ wlirlhor tlu' apjjroxima.tr i»‘^l will lead to tlir- same (‘oiudusioa 
as did the exact lest f(»r ihe ioar))]t\'-, when* tt O.oO, a -- 9, p -- 0.90, 
and N 10, \V(‘ iiist r‘ouij)u1e 


and then 


Ht 


I) ir'S; 


X V TT (h 9 i) ~ 0 oO I MO 
: ^ -- - - - - L>.53. 

a (T;, O.IAS o.i:>s 


X'rom Af)|)('!:<liv II, ^\l(ieh sIh^vs’ areas lo two taif> ef a normal curve, ^we 
hud tlial r 0 01 11. Altiiough tliis va!ue roi‘ is smai!(‘r Ihau the 
talue of O.OL'Mb obl:dii(‘d by U’^r of ile* biuonual. our cojjelusion is the 
same: if ().()a is our crlterioii. ttie liy})oi Iiesivs is ]'ej(‘(‘t<'d. Xoh\ hoa'cverj 
that if hn> km} Itan }jsui a^ (hr rrilrriar, ih' <,\(ni wrtiwd }VonId tell iis to 
accrpl [hr lii/poffirsis irhih (hr approfiinalr pnutdurc Indiccifrs that ihe 
hi/po/hisi^^ ^ihoufti be rrjrrU’il. 

A useful allern;ili\e foi m of lls* apj)r»)xiiii.il e t('sr i/ivolves testing the 
significance of t he difTena'cT helween a audTA’ ' lli(' iiiimluu'of cx'currences 
in the sam])le if th<‘ j>j‘o|)oi I ion of •K'cunxmct's in the sampJ(‘ wnr I lie same 
as in (he ]H)pulation; in use* of 

X a - ir\ 

(J cr., 

whciC’Mtt V AVr. Foi our problem, 

(T,, \ lOiO o 0 )f 0 .- 0 ; - 1 . 5 S, 

and 

j a - - ttA" 9 ‘ 50)10 

-- 2.00. 

(T a a Fob 

^ Sco Appendix S, section 25.1, 

5 See Appendix S, scctioTi 25.1 for a devolopniont of Ihc e.xprcs£>ion for (Ta. 
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This is, of course, the same “ value as was obtained when p and tt were 

(T 

compared. The eor.clusion, too, is the same. The hypothesis is rejected. 

The fact that the approximate test |!;nided us to the same conclusion as 
did the exact test, even thoii^h the probability given by the normal curve 
was incorrect, leads to an interesting (jiiestion: When tt = 0.50, under 
what conditions may the normal curve be substituted for tlio binomial 
and the same conclusion be arrived at conceinij)g a liypotliesis? 'Fhe 
answer depends on: (1) the size of the .sample and (2) the crib^rion of 
significance whicli is being used. Since the probability resulting from use 
of the normal curx e is always too smally^ when tt ~ 0.50, the use of the 
p — TT (or a — te.st will never cause us to accept a hypothesis which 
Ihe binomial would (e!l us to reject. Occasionally the p — tt, or a — tt/V, 
test will indicate the rejection of a liypotliesis which the use of the 
binomial would show .should be acci^pted. Consider the situation wh(*n 
TT — 0.50, N 60, a -= 38 (p = O.Ot), and (lie criterion is P == 0.05. 
Using the binomial, it Is found that the probability'' of obtaining a g 22 
or a 38 is 0.052, and the hypothesis (that the sample is a random one 
from a population having ir — 0.50) is ju'cepted. Using the normal 
curve, the probability^ is lound to be 0.030 and would indicate that the 
hypothesis should be reject^^d! 

correction. This correction was designed to be applied to the 
normal curve in order to increas(' the probability obtained from the use of 
the normal curve, so that the [iroliabilitv would be more n^ai’ly in agree- 
inent with the probability rjbtained by use of the l)iiu)iuial. If Yates' 
correction is applitMi to tin* ]!]u.v:trative data just mentioned, the proba- 
bility’ is iucrelised from 0 031) to 0.053 and ti)** » oin-lusion is tin' same as 


* Tbi.s will bo to bo tb»‘ r aso for tlio vanons ilhi.sO-al >!,!v>'n in if'xl. \fi 
cxplariatioa is gi\on in tho roforoAro nu*nTif>r)r'i .r« ejcvtnutf 7, 

Ulbo pmbability may obtainod from a labb* m U. Cl. Jtoiifig. j-CU Hr r. nhd 
TabU:<, John Wiloy and Sons, Now \\»rk, 105^. 

* The conifjutatiofi.s are: 


X 

a 


n ~ ttN 

Cn 


,38 - 30 

\/m)((K.5())(0,50; 


- 2 000 . 


Referring to Apjvendix II, the value of P in seen to be 0 1)39, 

" Yates’ correction is not explained in this text, since (for ren.'^ons whir'li will latf'r 
be clear) its u.se is not advocatiMl, An explanation of Vales’ eorn'ction is given in 
F. E. (’roxton. KUmtninnj with Applimlions in .Uvdicinir, Prentice- Hall, Inc., 

New York, 1953, p]). 251-250. 

For the typo of jiroljem under oon.sideration, Yates’ correction involves computing 
\a — ttA"! — J 

^ ^ where | j means ^‘takc the absolute value,” and looking up the result- 



Chap. 25] 


PROPORTIONS 


667 


if the binomial had been used. Note, however, that the use of Yates^ 
rorreeiion has ovcr-corrcctcd^ that is, tlie prol)ability is gt-eater than that 
obtained by the bii\omial. "rhis is important, sinee the use of the normal 
curve with Vates’ eorroetion will sometimes result in the accepting; of a 
hypothesis whi(‘li the binomial land the use of the un(*orre(;te(l normal 
curve!) would indicate sliould be rejected. For example: tt -- O.oO. 
N ™ 25, a ~ 4 (p = 0.16), and the criterion is P — 0.001. t-sing the 
binomiab the probal)ility of obtaining a ^ 4 or a ^ 21 is found to be 
0.000 01. From the normal approximation, a value of P = 0.000 7 is 
obtained Applying Yales' correcition, this value of P is increased to 
0.001 37. In this cas(i, the uncorrecled normal approximation agrees 
with tlio binomial, indicating that the hypoihesis should be rejected. 
Applying Yates' correction increases the proba])ility to such an extent 
that llie hyj)otlu‘sis v/ould be accepted! 

A lablc for tlic exact tc^l, when tt = 0,50. FjXteusive compula- 
tions of the sort just made, and referring to the 0.05, 0.02, 0.01, and 0,001 
leveIvS, show that, while the use of the normal curv^e will ordinarily rCvSiiIt 
ir th ::ame conclusion b<‘ing arrived at as if the binomial had been used, 
this is not by any means jilways the case. In addition, the iis(' of Vales* 
correct i(Hi will sometime.^ result in over-correcting to such an extent that 
th(‘ conclusion ?o accept tin* hypothesis will differ from the conclusion 
based on the binumiai. 

One possible solution may have occurred tp the reader. 7'hat is, to 
make tie? a - ttA' te.st both \\ifh and without ^'ates' correction. When 
the two procedures lea<i to the same conclusion, that conclusion will be 
th(' s.ame as if tlie binomial had been used. Tins is true bei'auso, as Ave 
already know, the a -- tt.V test without correction results in a smaller P 
value than do(‘s the l)inomial, while the a — rV test witli YYito.s' correc- 
tion 3 'i(‘lds a larger P valin* than does the [jinuniial. The difficulty with 
this solution is lhai contradictory conclusions do occ'asionally occur. 
Whonev('r the two ])ro(’edures result, in difTerent (‘r,n( Insions, resort nu;st 
lie had to the binomial. 

The best solution is to make use of the binomial whenever possible. 
Following pro(*edr,res descrilual before, it is not difficult to expand bino- 

iag figure in Appendix. If. Foi the illustration above, 

la - ttA! - 5 - 30i - i ^ 

- - - _ . — =t ; <M<). 

\/()0{U.50)(().50) 

From Appendix If, P — 0.053. 

® Another illustration: when using P = 0.05 as the criterion and with tt *= 0.50, 
N != 100, and a =» 40. 
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mials up to about IV - 20 or 30; hut lieyond that, tho work becomes 
extensive. Boolcs are now awailahle from whi(‘Ji one may read the v^alues 
of the terms of binomials'^ (or (1) :V ~ 2 to A" -10 by steps of 1, and (2) 
for N = 50 to X - 100 by stops of 5. Values of tt other than 0.50 arc 
given, but at this point of oni* difcnssion we are Inteiv^at'd only in tt -- 
0.50. From tal)le>, lab!<‘ 25. i has bee-i eonstrueted. showing the 

value of n at various (n'objbilli\ {U)inrs ‘ind for sehadvd values of X. 
With a taide siudi as th.is ava.i ible, hfis ]i<> ((» iise normal 

curve, with or without 5';u(s’ eoneetion, iti order to avoid th(' labor of 
expanding a binomial. Neither is it n(aM*<^,ar\' to ('xpand a biiiomial, 
since d'able 25.1 gives tin* results of such iwipaii.-ions. 

For samples having .V > ItlO, the normtd ;ipp»a»ximation will have to 
bo used until some organization witii exrrnsj\(‘ (‘oinpvuing facilitii's can 
proviile ns uilii extended labh'r. of bummiaN 

The exact lcst,ir 0,5th V i te eompan v nu'hl'edied the results 

of a ‘Mcsc” in vd>i« h orodiie! ai»d lliose of three eonip<hitors were 

judged by (‘ight pliysn’i.**,:,., .-‘|aa*ej!izina in de‘ t]ea,t'nenr of ijn* nose and 
tilt* throat. Fmuv of lie' X deeoj.v u':i,r;u.‘d a JUc‘fM‘ene(‘ for tin- com- 
pany’s el^-ai'Cll vlu'li .-.1),-^' biaia: ^ .v«> pieieiKMl No, 2: 

none j.)referreij \i>. 5. a. id 2 po-h ;ied .\«>. i !i iln i‘«‘ ^^eie no dillta't'nee 
b(*t\\^*(‘n the i/ra.od-, tatch e.ooid ha\e ‘‘n tpind ehan-'t' of being 

.selt'ettal, so lluii Foe po'habshiv I f brant! ! h.nng preierred would 1 k‘ 
0,25 TT 0,25, Now, V i-l. • \ajhiate, in tie' expt e.'-is'OP. 

(0.75/i i- 0.25d r\ 

the terms whicli includt' ,t \ d®, A\ and .P. .Vs befon*, A indieatt's 

an otanuTence in this insta?h*e, a preference for braiui No. 1, and B 
inrlicates a non-ocenrrence. 


rht‘se 'rirf‘ s' 1 ; .\aUeaal iUt/fua «>} .'ran.la. ti"-. Ttihh ", . ■ 'A, }i> 

Di'yfrihuUt.ti^ VyHshiiij^ton, mC), an. I sj' If ^"i. l<naM).e >" 0^’? .h'/ ^ ■>///. o-'’ .T»i|h^ 

and Sons, New ^'ork 1'1<‘ - , ua^o! ■ i,-tl t t). ' “fro'-jifM- ditUa from 

thos(‘ used in this text. Tla* lapiiN ;iI“Taa air- 


Thh; ^ r! 


A 

TT 


Ur frn m :■ ' I 

7 

n 

P 




. .1 


P 


1 ho rrjider is urg*‘d to riMi.fiid/r*!' th,at, v. hm irva'i.'ing tnimulaf ioas of prohahilitios 
such as are. given m tlc-so refer‘*nee«?, t. iking (me nunns 1}ii‘ (•ii!n\ilati\ e prohahility, 
ho irni.st: H) ducrva^u' the (‘puled a valne t.y mie when the original (/niaulaiion is of the 
“or inore’’ tyja*, a ^ in tlie iV.n'S.n (a iSV.O'Pcad.-^ x«»lwiiie, and P?) .nena^^t llielahleda 
value hy <»no when riir> migirnd < n lonaiti n »>! -‘e- 'oo ’ tyja*, a.s in the Itoinig 
book. 
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TAHLK 25.1 


Values nf a at Selerterl Lttu'er and i pper Prahability Points 
for Spocifu'd I'alttvs <tf ;V 
T - O.^iO 
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w il h ill a 

' 'Jw. s 
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ih.* </ 

Ml' .M i. 

- <3l' TMp. 

lO'-l probability x 


nil i/ '. <• 

i.r St 11 

\ I« fill 1, , 

pet p' 
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)0 ti.’.'/'thr 
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Table 25.2 shows the probability of each of the nine terms of the bino- 
mial. Adding the probabilities for the last five of the terms gives 0.1138, 
which is the probability of obtaining four or more favorable statements 
for brand No. 1 if the four brands are really alike. It is clear that brand 
No. 1 did not receive significantly more than one-fourth of the doctors^ 
votes. If the size of the sample had been larger, there might have been 
a significant diflerencc in favor of brand No. 1. However, there is no 
reason to believe that if N were larger, p would still be 0.50. 


TABLK 25.2 

Pr€fbabilitY oj Earh Term in the Expression (0,7 5B f 0,25 A)^ 


"" i 

Number of occurrences ; 
(number preferring 1 
brand #1) i 

\ , P i 

Proportion of fx-cuiTcncoH | 
(proportion pn‘ferring 
brand ^*1) 

1 

I'i.xpression 

Prob- 

abiJity 

0 

0 


0. 1001 

1 

0 125 

8(0 75if ) UO . 25.'! ) 

0 2070 

2 

i 0 250 

28i0 75« !■''() 25.1)* 

0 3115 

3 

0 375 

5<>(0 TS/DHO 25/t)» 

0.2070 

4 

0 500 

70(0 757ri*(0 2oA)* 

0.08G5 

5 i 

0 t>25 1 

5(>;0 75/l)n0,25.'1)» 

0 0231 

6 i 

0 750 i 

2H(0 75/l)*(0 25.1)' 

0.0038 

M 1 

7 1 

0 875 1 

SiO 75/i)(0 25.H’ 

0 0004 

8 I 

1 000 1 

(0 25.-0“ 

0 0000 

Tot til i 

r _T'. J 


~ 1 0000 


Nofc that in the foregoing wo considered only the last five terms of the 
binomial, the terms for which p - tt +0.25. Wo ignor(*d the first 
term, which is the only one for which p tt ^ —0.25. The reason for 
making such* a one-tail t(\st is that we were interested in knowing whether 
the preferences for brand No, 1 significantly exceeded ir -- 0.25. 

All approximate le.st, tt 0..50. While at an Arabian horse ranch, 
the writer was told; "‘All 30 of the mare.s had colts this season. This is 
unusual, as only 70 to 80 per (‘ent ordinarily have colis m a single sea.son.’' 
Now N ~ 30, a = 30, p — l.O, and, considering tt to \)v. 0,75, we are in a 
position to state just how unusual an occurrenee this wris. We merely 
need to evaluate the term which include.s /H” in the ex])ression 

(0.25« + 0.75/1 )'^ 

where, as before, A is an occurrence (birth of a colt) and B a non-occur- 
rence. 'idiat term Inis a prol)ability of 0.000 18, or about 2 in 10,000, and 
is a very surprising occurrence, indeed. The ranch owner did not assign 
a reason for the surprising fecundity, but one would be justifi(»d in reject- 
ing the hypothesis that the observed p of 1.0 was based on a random 
sample from the population represented by his piist experience. Note 
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that, again, we have made a one-tail test, since we wdshed to know 
whether 1.0 significantly exceeded tt = 0.75. 

Let us sec whether the Tiorrnal curve (‘an be used as a substitute for the 
skewed binomial. Since N ^ 30, the sample is fairly large. However, 
TT is 0.75 rather than 0,50, as was the case when the normal curve was used 
before. We. compute 


<7p ~ 


/ttt / (0.75) (0.25) 

yv^-V- 30' 


0.079 


and 


X p -- TT 1.00 — 0.75 
a (Tp 0.079 


X 

From Appendix G we tind ibat a value of - ™ 3.1G cuts off less than 

a 

0,000 97 but more than 0.000 09 of the anai C)f a normal curve, in one tail. 
Thi.s approximate procedure yields a (irob.^ tility wliich is much larger 
than the exact procaalure, but our (‘onclusion coiicerning p is the same. 
Tfu'5 urompts us to rais(^ a (|U(‘.^lion whicli is similar to oru’ raised earlier: 
When TT 5^- 0 50, under \v!ia( conditions may the normal curve be sub- 
stitute<l for (lie binomial aud th(‘ same conclusion be arrived at conciM’ning 
tlie hypotlu'sis? problem is now more complex, since the answer 

(h'pends on: (1) the value of t, (2) tlu' size of the sain])le. and (3) the 
criterion of significance which is ised. For <.)ur purposes it will ho suffi- 
cient to note, first, that when tt t*'- 0.50, the normal cur\'e i.s a le.ss satis- 
factory approximation to the bimunial than when tt -- 0.50, for any 
given N. In fact, whe.riTr 0,50, use of the normal curve will sometimes 
yield a jirobahihty that is loo small aud sometimes one that is too large, 
Se(‘ond, Yat(\s’ correediou can Ic' of no assist <ir(f‘e. since it i.s not dt'.signed 
for situations in wliich tt y 0.50. 

Tables for the exacl tesi when tt 0.50, For situation.s in which 
TT ^ 0.50, we need a serie.s of tables, .similar to Table 25.1, each table 
having to do with a diflerent tt value. Huch an undertaking is too ambi- 
tious for an ek-menfary li'xt, and, in any event, the values of the terms of 
skewed binomials may i)c obtained from the two references cited in foot- 
note 9 For purposes of illustration, "]\al>U 25 3 has lieen prepared, 
dealing with the probability points for samples of various sizes when 
n* ™ 0.20 or tt 0.80. 


Couful. nee Limits of TT 

Sometimes the value of p is known, but tt is not known, and it is impor- 
tant to state the limits wdthin which tt may be expected to occur. As was 
noted when discussing the confidence limits of A", we must first decide 



TABLE 25,3 


Valuifs of a at Selecteti Lotcer and Upper Probability PoiritH 
for Specified Values of N 

TT 0.20 

Notes for the use of this table: <1) earli a value shown for a lower probability point, to- 
Ketber with nil a, \ allies smaller than the one sliown. has the indieaterl pioViability or less. 
<2) each a value shown for an upper probability point, together with all a vnlucs larger 
than the one shown, has the indicated probability or less, (It) this tiiblr may bo used 
when IT =» 0.80 by reading iV — <j fot a ami ro\eraing tho lowei and upper points. 
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what confidence limits we want. Of course, the size of the sample from 
which p was computed must also be known. We shall proceed by con- 
sidering first an approximate method and then an exact method. 

An approximate method. After nearly 23 years of use, the Chicago, 
Milwaukee, St. Paul & Pacific Railway found that 22 out of 50 red oak 
ties, which had been preserved by means of creosote applied by the ^^full 
celP^ process, were still in good condition.^® For this sample, N = 50, 
a = 22, and p = 0.44. What arc the 95 per cent confidence limits of tt? 
To obtain these two values, we employ the expre.ssion which has been 
used before 

X ^ p — TT 
<T CTp 

but write it 


X 

(T 





- ir^ 


N 


We know p and N, From Appendix II or the last row of Appendix I, we 

X 

obtain the - value (1.9C) associated with the 95 per cent' confidence limits. 

(T 

The three known values arc substituted in the equation just given, and it 
is solved^^ forTr, giving: 


1,96 


3.8416 


0.44 - TT 
/tt — TT** 

V-sr- 


0.1936 - O.SS tt + 

TT — TT® 


50 


*'* The data arc from Proceedings of the American Wood Preservers Associaiion, 1935 
pp. 133-134. 

•' The quadratic 0.1936 — 0.956832ir -h 1.076832jr’ id solved by computing 

-(-0.956832) ± V (a95^32)» 4 (0. 1 936) ( 1 .076832) 

__ - 2(i;o7G832) 

If the first equation were to be wntt 

a - irN 

1.9G -= 

VNiir - TT*) 

we woiild, initially, have only integers on the right. 
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3.8-1 IOtt - 3.84165® 
50 


= 0.1936 - O.SStt + T*} 


0.0768327r - 0.0768327r® =- 0.1936 - O.SStt 4-7r*; 
0.1936 0.9568327r + 1.0768327r® = 0, 


-t' 


0.671125 , 1.212539 

IT = and ----- > so that 

2.153664 2.153664 




T, = 0.312 and TT-j = 0.577. 

What we did was to detenniix;: (1) tti = 0.312, whioh is .so located that 
p = 0.44 cnl.s olT the upp<ir 2^ per cent tail of a normal curve around tti 

= 0.066, and (2) tt^ = 0..577, which 


with O',, 


is so l()C«t('ii that p - 0/1 outs ofl* tlie lower 2.2 per (‘cnt tail of a normal 

, . , .Ws /(0.577)(().423) 

curve around tti with cTp = V' .r ~ \/ == 0.071. tauirt 

^ tV ^ oO 

25.3 illustrates what has been done. 



Chart *1.1 lV*r C<*iil C4»iih<h‘iM‘c LiiihIm «»f tt, hJk'ii /> ~ O.li and JV « 50, 

Dcicrfiiinrd hy nf fx,, and Normal Ciir\i*»4. ’^J'lie crosH-lwitclu’il ufcii is 2.o per 
cent of thf left «Mirv<'; th<* stipplt'd iiri'u is 2 5 pcT cont of the right ciirvo. 

The method just described Riv^es satisfactory results when N is large 
and when p is not too far rcrnov(3d from 0.50. Its shortcoming will be 
apparent when wc apply it to the following example. 

Standard-slrcngth digitalis was injected into each of 20 frogs. As a 
result, 17 of them hrid rapid systolic standslills (they died). Other frogs 
\vere injected Avith half-strength digitalis and with digitalis alleged to be 
half-slrengtli, but the results of those tesis are of uo (^oiureni to us in eon- 
ni'Ction with this example. For the grou[) of frogs given full-strength 
digitalis, N - 20 and p = 0.S5. What are the 90 per cent con(idonce 

X 

limits of TT? Proceeding ns before, we first obtain the - value of l,G15 

(T 
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from the last row of Appendix I, and then write 


1.645 


0.85 — IT 
jir — ■JT* 


wliich, when solved, yields 

TTi ~ 0.078 and 7r2 = 0.938. 


’’J'hesc results seem all right until wo look at Cliart 20. 4» which shows what 
wc have done. Now, it is immediately ap})aront that the use of normal 



VALUES or p P*0M 


CJiart 25.4. l'iiHatiHrurior> .\p|»rf»xittiatioii of ihe ‘>0 l*er On! C\>nri- 
ilr.iirc IJmilft of TT when p « 0.85 oml N = 20, Delenninetl by lUe of <7p anrl 
Normal Curves. The oruss-hatchoii area is 5 per cent of the left euivc; the 
stippled area is 5 per cent of the right curve. 

curves cannot be justified, particularly for determining 'i'he normal 
(airvc at the right indicates that values of p > 1.0 would occur, which is, 
of course, impossible. 

The exact method. An exac^t determination of the (‘onfidcnce limits 
of TT for the full-strength digitalis data requires a much more laborious 
procedure. Considering first the determination of tti, we must ascertain 
the value of tt which, when inserted in the expression 

(tB = 

will result in a = 17 (p = 0.85) cutting off the i/ppcr 5 per cent tail of the 
binomial. This reciuircs successive approximations, and we shall first try 
TT «= 0.65, From Table 25.4, it may be seen that in the binomial 
(0.35/? + 0.05/1)^°, the probability oi obtaining a ^ 17 is 0.0144, Since 
this probability is less than 0.05, we must try a slightly larger value of tt. 
In the same table, it appears that, when tt == O.GG, the probability of 
obtaining a ^ 17 is 0.0535. If tw'o decimals are sufficient for tti, we 
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would conclude that the lower 90 per cent confidence limit of tt is 0.66, as 
shown in the upper part of Chart 25.5. In the event that three decimals 
are wanted for tti, we would note that the next value to be tried for tti 
should be larger than 0,655. A value of 0.657 was tried, with the results 
shown in the sixth and seventh columns of Table 25.4; for a ^17, the 

PROSABiLlTY 



VALUC OF a 


PROOAB.'LITY 



Chart 25.5. 90 Per Cent Confidence Limit.s of tt when JV =* 20 and 

a « 17 (p = 0.85), Deter mined by Crc of the Expression (tB -f Data 

from Tables 25.4 and 25.5. 


probability is seen to l:>c 0.0506. Trying, next, tt = 0.656, it is seen from 
the tabic that the probability of a ^ 17 is 0.0497. The value of tti lies 
between 0.656 and 0.657, but closer to 0.656 than to 0.657. 

In order to obtain 7 r 2 , the upper 90 per cent confidence limit, we need to 
determine the value of tt which, when inserted in the expression 

(tB + 

will result in a = 17 (p = 0,85) cutting off the lower 5 per cent tail of the 
binomial. Since 7 r 2 was 0.938 by the approximate method, we shall first 
try IT = 0.94. From Table 25.5, it is seen that a g 17 includes 0.1150 
of the binomial, and we next try tt — 0,95. This value for TTa results in a 
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probability of 0.0755 (see Table 25.5) for a S. 17, so we proceed to try 
X == 0.96, which, as shown in Table 25.5, gives a probability of 0.0439 for 


TABLE 25.4 

Probabilities* and Ciinjulative Probabilities of Values of a in the Expression 
(tP ^ 7r.i)20, when vr - 0,65, 0,66, 0,657, and 0.656 


(The proVin>nlit> of a ^ 17 ih shown in boUifaf'f' type ) 
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♦Thu non-uuiiiMhiln’** piohuli'litius mav to »‘o’upiitu»i ns in 'ratilu 23.8. Wlu ri x i onh'st i of not 
more than two tloiimals, profmbiht :.’s c\im)!lati\ u prv'LaV* *il u'«v may be obtained ftoin National 

[biiean of Standards TaH^s of 'hf /iu. -’V'i il P; l iHibililij Dub lim'ier., \\ a.snriKtnn, 1‘Mxt. The curiui- 
Itttivi; hf^uics shown aboio were obtainni trom tfie non-r uinulatr.o fi^'ures before rlie iuin-< ’.Mrudative 
tiKures were ronniiud. 


a 17. We roneliule IhatTTo = 0.9G, and tliis is illustrateti in the lower 
part of Chart 25.5. \'alues of n intermediate between 0.95 and 0.96 
could be tried, but we shall terminate th<‘ illustration at this point. The 
90 per cent eonfitlence limits (to two decimals) are Xi — 0.66 and 
X2 - 0.96. 

The exact method of determining the e(niridence limits of ir neces.sitates 
two sets of trials for each different problem. Xote that, in order to make 
a useful estimate of the values of Xi and which should be tried first, 
the approximate solution using (7p should ordinarily pn^ ede the exact 
solution. If binomial tables, such as those mentioned in footnote 9, are 
available, the approximate solution may be omitted. 



TABLE 25.5 


Probabilities* and Cumulative Probabilities of Values of a in the Expression 
(tB 4- TrA)^\ when ir - 0.94, 0.9S, and 0.96 


('I'ht* probability of a ^ 17 is shown in boidfaoo tyjw.) 
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VALUES OF TT 



Chrtrt 2f>.6. 95 Ter ('cnl Confi»lrn«’CJ LiniitH of ir for 

Valiu'M i*f p from Suniplos of Vtiri<»iiH Siy.en from 10 to 
1,000. note following filli' of fMiarl 25.7. 
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To avoid the arduous labor of expanding a number of binomials, 
diagrams liavc been prepared by Clopper and P(3arson which enable one 
to read the lower and upper 0.05 and 0.99 confidence limits of tt. These 
are shown in Charts 25. 0 and 25.7. 

VALUES OF 77' 



VALUES OF p 


<'harl 2.5.7. 99 IN r On! <\mfi<Iciirr f.iniils #>f tt’ for 

\ of f> from Sam|i!<*?4 of Si/.t-s from 10 lo 

l-.OOO. lU'pi by [X'f (’ .1 ( 'ioppfi ;iT)ij 

fc' S '"ri.i' I sc of V 'oiifulrii' p t>r *' i lu(’i;i! biinits, 

ih(t, \ ill l?ri, |» '!i0 I*y (sif r< spoiub-iK (’ IV.M res I l),’U 

I hr TT valiM's “ao' imO roFMplpi.cly mi'cu Ue iIk' nt rcr- 

ijiiu poni(s uwf obt-uiu'd b> jo((M loti noL l)\ diicct 

(SlIiMlI.ll Kill 

Signi(‘n*iui<*c of lli<‘ DilTerenre Uelween />i and />> 

All approNiiiiate oielhoil. IbUereiiee was luach^ earlier to 50 red oak 
ti(‘S wliif^h had be(Mi pn‘.s('rve<i by means of en'o^ob* applied by the “full 
oe.H” proe<sss. After 211 years of service, 22, or 1 1 [xu’ e(‘nt, of t}u‘se ties 
\v(‘re still in service. When tlx'se ti(‘s were laid, e.nother group of 50 red 
oak ties, cn'osote-irnpregiiided ny the ''Hue])ing” process, were also put 
into use. Of this si'eoml group, 18 lies, or 50 per e(*nt. waua' still in 
servicci after tlu' passage of 25 yeans. Now' w'e ha\^c two sampk's: one, 

** T)\f‘ data aiu from Proceedingf: of (he American Wood Prci^crvcra Ass(K'iaUon^ 1935, 
pp. 133 i;n. 
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on which the ‘‘full process was used, had iVi = 50, ai == 22, and 
Pi = 0.44; the other, on which the “Rueping'* process was employed, 
had N 2 — 50, 02 = 18, and p 2 = 0.36. We wish to know whether there 
is a significant difference, at the 0.05 level, between these two proportions. 

The procedure is essentially the same as that u.sed for two sample 
means; we shall compare the difference to the standard error of the 
difference. The standard error of the difference between two percentages 
is 



Now, we do not know tt, and, if we did know tt, we would almost certainly 
wish to test pi against tt and p 2 against tt rather than to examine the 
significance of pi — p 2 - Since we do not know tt, w'e make an estimate, 
p, based on the information in both samples. Thus, 


Ql + 02 

+ A^r 

22 + J8 

50 4" 50 


0.40. 


Now we are in a position to compute 

[n , M 

- ) 

f(0.-K}j(0.6()) (O.lOKO.fiO) 

\ .-,0 + 50 ’ 

0.098, and 

p, _ 0.41 - 0.36 O.OS 
-p. 0.098 ~ 0.098 “ 

Referring to Appendix II, it appears that P — 0.41 , and w(' conclude that 
the difference between px and p* i« not significant. 

Exact method. When the tv/o sample.s from which pi and po arc 
obtained are small, the approximate method just des(‘ribed should be 
abandoned in favor of the exact method. Later in this chapter it will be 
shown that a chi-square te.si for a “2 X 2’' table is identical with the 
V\ — p 2 t<5St given above. At that point the exact test will be described. 
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PART 2: THE CHI-SQUARE TEST 

As we shall use it, in the present discussion, the x* test consists of 
summing a series of ratios, each ratio having been obtained by ; (1) taking 
the difference between an observed frequency (/) and an associated 
population or computed frequency (f^), (2) squaring this difference, and 
(3) dividing the squared difference hy fc. Thus, 

^ fc 

In Chapter 26 we shall make use of a slightly different aspect of chi-square 
when we compare and a^. 


The 1 X 2 Table 


Approximate method. To demonstrate ^he identity of the test 
and the p — tt (or a — wN) test, we shall use the example employed 
earlier in this chapter which involved a sample of 10 marbles, 9 of wdiich 
were black. Using 0.05 as our criterion, we tested the h 3 '’pothcsis that 
the sample was a random one from a population having tt = 0.50 by use 
of CTp and also by use of 0*^. If wc make the same test by means of x^ 
we compute: 


Color of 
marble 


lilack. 

White 


Observed n\imb<*r 
of marbles 
/ 

1 

JO 


Computed 
miniber if 1:1 
ratio exist.s 
/. 

5 

10 


/ - fc 

- \ 

' "o‘ 



(/ - 

I fc 

1 

10 

1 3.2 

16 

3,2 


0'4 


This is a 1 X 2 table, since the observed frequencies occupy 1 column and 
2 rows. Tt is the simplest type of a one-colirnn table. From the above 
table, the value of X‘ seen to be 6.4. and we may determine the proba- 
bility of such a value of X' (or greater) by referring to the table of Appen- 
dix J for the appropriate number of degree.s of freedom. For our problem, 
n == 1. This is so becaii.so a figure ma^^ be freely entered in one of the 
two boxes in the /-column. However, once this f igure has been put down, 
the second figure is thereupon determined, since the total is 10. From 
Appendix J, when ii = 1 and X' = b.4, the value of P is seen to be 
slightly larger than 0.01, causing s to reject the hypothesis on the basis 
of this approximate test. If a more detailed table of X“ values^'^ were 


We can also obtain this probability by looking up x. not x^ in tke normal-curve 
tabic of Appendix II. 




KOGmT Of 
<W0»N*Tf 


n»to 

% 


C ft 9 12 IS 18 21 24 27 30 

VALUE OF 

Chnrl 25. B. TIk* I)iHlril»ulion for n — I, n ~ 2, ii -= 5, an<l n - 10. For 

dcHoriptivr legend hoo ojipo5;it»^ pfige. 

682 
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available, we would find P = 0.0114, exactly the same as for the test 
involving (Tp (oroa). As a matter of fact, the p — rr test (or the a — tt.V 
test) and the X' test must produce the same final P value. Note that the 

X 

- value obtained for the p ~ tt (or a — ttN) test is the sciuare root of the 
(X 

X" value. This can be seen in broader perspective if we look at the last 
row of the ^-table (Appendix I), which gives - values for the normal 

(T 

distribution, and the first row of the table (Appendix .1), which gives 
X^ values when 7i = 1. For any given P value, the value will he seen 
always to be the siiuare of the normal value. 

The valu(‘s of shown in the first row of Appendix J are obtained from 
the distribution of x^ ^or one degree of freedom, which is pictured in Chart 
25.8. 

The x^ fells us the [)rohability of getting a disagreement })o tween 
observed and computed frequencies equal to or greater than that observed, 
in either direction. For the marbles, the P value of a link* more than 0.01 
represeoUjd the probability of i) or 10 black marbles ami of 9 or 10 white 
marbles. This is rnu‘ even though only one tail of the chi-.squarc dis* 
tribution (sec Appendix J) is involved, because the / — /c values were 
squared. 

Exact method. (9ii-sfjiiarc Is an approximate test, for the same 
reason that the p — tt (or a -- tt.V) test was an approximate test; a con- 
tinuous distriluition of sample, values was assumed to exist, wtum actually 
only the eleven terms of the binomial (0 50/? + O.oOA)'” can occur. 'Jdie 
exact procedure was set forth on pages (id I 003 and it will not be repeated 
here. The approximate nu*thod, u.sing xS em]>loved in pla(‘e t)f 

the exact mctliod, and the same conclusioTi arrived at. under exaf'tly the 
same conditions that tlu' p — tt (or a — ttA-) test moy be used. These 
comlitions were discussed forTr = 0.50 on pages 000 008 and lorTT 0.50 
on page 671. 


('hurl 2.5. a. The \' Disirihn! ion for n = 1, ii — 2, n = .5, and n — 10. 

Noti' tlini differiMit scmIps aru wnvd for tli** two part.s of the chart Tht^ ordinates wore 
coiTiputcd from tlu^ t*vf>ri's; ion 

- x* n - 2 



which is not difllcult to solve* if io^jjati thins arc used, Tlui mode of tiip x* distribution 
is at X* ~ ” '-^1 except wh(‘n = I, and iluui the mode is at /('ro, as may lx* s(*en 

above; the mean is at x’ =° shown in the lower part of. the chart, the skewness 

of the distribution decreases as the number of drpre<*s of freedom increases. 
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Confidence limits of v. As a matter of possible interest, it may be 
noted that x® niay be used to determine the confidence limits of tt. The 
expression is 



and it is the exact equivalent of the approximate method given earlier, 

• The 2X2 Table 

Approximate method. As will shortly bo made clear, the test 
for a 2 X 2 table leads to the same probability, and therefore the same 
conclusion concerning a hypothesis, as does the pi — pn test described 
earlier. To clarify this point, we shall use the same illustration that Avas 
used for the pi — pz test. The data are now set up as in Table 25.6, 
which we call a 2 X 2 table because it has two columns and two rows of 
observed data. Two-column tables w ith more than two rows will be con- 
sidered later. 

There are no population fre(|iienoie.s in Table 25 6, but we obtain com- 
puted frequencies by rioting that, if the ties treabal hy the two processes 

TAFU.K 25.6 


Railroatl Tii^s in I se at End of 2 3 “Year Test t^eriml 
by Method Vsed to Apply Creosote l*reservative 


Process by which 
creosote was 

j III nso at ond 

of tosf period jj 

Total 

applied 

1 Vo.^ 

Xo 


Full coll . . . ’ 

' ’ ‘ 22 ' 

28 !' 

50 

Ibiepinp; , . . . 

~18 


'50 

Total 

i " to 

()() ! 

100 


l^ara from Proi uf fKc Ammi tin U'owd lern A-ieo' ialiim, 

193.0. PI*. 133 134. 


showed no difTerence in regard to tin; nurnlier in u.se at thr end of t iio tost 
period, we Avould expect tlu' first box (Kow 1, Column 1) to (‘ontain jVo 
of the 50 tics treated by the full e<dl proc(‘ss. and the sec ond liox (Itow I, 
Column 2) would be (expected to have uf the 50 ti(‘.s treal^ed by the 
same proc(\ss. In like fashion, the third box (Row 2, Column 1) would 
have iVo of Lh(3 50 ties treated by the Rueping proc(^ss and the fourth box 
(Row 2, Column 2) would have rVo <^f the tics treated by this process. 
These /c values have been computed in Columns (2) and (3) of Table 
25.7. In ColumiivS (4), (5), (6), and (7) of that table, the computation 
of X® is carried out and — 0.67. A 2 X 2 table, with marginal totals 
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set, has n = 1, as will be explained in the next paragraph. Referring to 
Appendix J for n = I and x* = 0-67 gives 0.30 < P < 0.50. A more 
detailed table of X" would show P — 0.41, the same as for the pi — 

X 

tost. Note, again, that the " value for the pi -- test, which was 0.82 

(T 

(or 0.816 to three decimals), is the square root of the value of 0.07. 

TABLK 25.7 


Computation of Data of Table 25.6 


Ckill 

Determination of computed j 
1 frcquc‘nci('S j 

IVoduct of row * L * 

/ 

f 

1 

/-A 


(7 - A) 


and column 
totals 

Col. (2) -r 100 

i 


! 

fc 

(1) 

(2) 

(3) 

(4) ! 

05) 

1 (fi) 

(7) 

Row 1, column 1 

.50 X 10 = 2,000 

20 

22! 

“1-2 

i 4 

0 20 

Row 1, column 2. 

50 X 00 « 3,000 

:^o 

28 

—2 

4 

0 133 

Row’ 2, column 1 

50 X 40 = 2,000 

20 

18 

- 2 

4 

0 20 

Row 2, column 2. 

50 X 60 = 3,000 

30 

32 

. +2 

4 

0 133 

Total 

. . . 

Too' 

l6() 

”6 


o' 67 


When the/r entriea are not integers, the> ahould ho rarned to one dccmial in order that 1/- will not 
differ from 2/ by as much as 1. Actually, only one of the ft figures in C'olumn (3) imi.st he computsd. 
The others may be obtained by subtraction from the row and column totals of Table 2o.fi. 


That n = 1 for a 2 X 2 table with marginal totals set may be clarified 
by considering this small table: 


1 

1 

100 

1 

:50 

“mTi Tlo'” 

250 


which has the marginal totals given, but has no entries in the Iboxes. If 
a figure is entered in any one box, it should be clear that the figurt^s for the 
other 3 boxes are thereupon determined. If 20 is written in the first box, 
then the figure for the second box must be 80, for the third box 110, and 
for the fourth box 40. Inasmuch as we were free to enter a figure in only 
one box, there is but one degree of freedom. For tables larger than 2X2, 
the same method will tell one the number of degrees of freedom if the 
marginal totals are set. It is more expeditious, liowever, merely to 
compute 

n = (7? - l)(C -- 1), 

II where R is the number of rows and C is the number of columns. The 
following relationship may be of interest: 
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Degrees of freedom lost because of marginal totals*^ (/? ~ 1) -I- (C — 1) 4" 1 

Degrees of freedom remaining, n {R ~ 1 )(C D 

Total (number of boxes) RC 


The computation form shown in Tabic 25.7 is not required when x® 
computed for a 2 X 2 table. It was given here in order to clarify the 
procedure involved. The value of X“ lor a 2 X 2 table may be obtained 
more expeditiously by use of the expression, 

2 - 

^ - 'Ar,iV7iv>»” ’ 

where the symbols refer to box and total frequencies as shown below: 


at 

i). 

N, 


b. 

N, 

A’o 

' 'Nr 

' N 


For the data of Table 25. G, 

^ ^ 1(22)(32) - (28)(18)]^100^ 

^ (50) (50) (40) (GO) 

_ (704 - 504) *100 
(25(W)(24b0) ’ 

^ 4,000,000 ^ 

0,000,000 

This, of course, is the same value as obtained in Table 25,7. 

Exact procedure. When N is small, the probability given by the 
test is too small, with the result that the x'* t<‘,st might lead to a hypothesis 
being discredited, whereas the exact procedure might cause one not to 
discredit a hypothesis. 

Ojnsidcr the following data dealing with two forms of treatment 
applied to IG laboratory animals which had previously been inoculated 
with a virus, 'rhe figures for the two treatments appear so divergent 



1 Uesiilt 1 


Treatment 

llecovered 

Died 

Total 

#1 

7 

3 

10 

n . ■ . 

0 

0 


Total , • ■ 


' 9 ' “ 

— 

Yg 


A degree of freedom is not lost because of every marginal total. If any on^ 
vertical and any one horizontal total (including the grand total) arc deleted, they 
may be restored from the information given by the remaining totals. 
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that it may scorn to the reader to be a waste of time to apply a statistical 
test. Nevertheless, using O.OJ as our criterion, let us see wliether there 
is a significant difference between the two treatments. Our hypothesis 
is that the two groups, of 10 and 0 animals, are from the same population 
in respect to the proportions recovered or died. Using first the chi- 
square test, we get 

, _ (aM - 

NiN^NaN. 

- (o)(3)r^i<i ^ 

(10)((i)(7)(U) 

deferring to Appendix .1 for /6 I, w(^ find — 0.01 and, upon the basis 
of this approximate test, would conclude tlial. our hypothesis was dis- 
credited. However, the probaliility is actually larger than indicated by 
the test or than by the p\ — test, which wc already know is the same 
as the X" test for this type of problem. 

The probability of any arrangement of frciiuoncies in the boxes of a 
2 X 1 'Ide, with marginal totals set, may be obtained from 

Nx\N2\NJNt\ 

A^!a,! 6 i!a 2 ! 62 !* 


Solving this expression for the data resulting from the two treatments 
gives 


10!G!7!9! 

16!7!3!oT6! 


0.0105. 


This is the probability of the particular divergence which was observed. 
If any greater ditTcTcncos between the two samples (treatments) are 
j)Ossible, their probabilities must l)e added to th's. (It will be remem- 
bered the x^ the Pi — Pi test ^ivc us the probability of a dilfcr- 

cncc equal to or greater than that whi(‘h was oixserved.) The first column 
of Table 25.8 shows all the possible combinations that will produce the 
marginal totals of our problem. There arc seven in all. From the 
second column it ma}^ be seen that none of the combinations shows a 
ditTcrcncc greater than and in the same direction as that wdiich was 
observed. However, Combination VII shows a greater difference in the 
opposite direction. Wc therefore ascertain its probability, also, which is 
0.0009. Adding the two probabilities for Combinations I and VH gives 
0,01 14 and leads us to a different con».lusion'® from the one reached before: 
the hypothesis is not discredited. 

Drawing conclusions concerning 2X2 tables with small froqucncios may be 
facilitated by use of a tabic, prepared by D. J. Finney and R. Latscha, which shows 
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TABLE 25.8 


Values of pi, pa, and pi — pa and the Probability of Each of the Seven Com- 
binations Yielding the Marginal Totals Shown Below 


Combination 

[ Proportion of row 
total in first 
column and 
difTerenec 

Probability of the combination from 
AM A ’ 2 hV JAM 

7 

3 

10 

Pi - 0 7 


I 0 

6 

6 

p2 — 0 

0.0105 

7 

9 

16 

Pi ~ P 2 = +0 7 


6 

•4 

! 10 

Pi = 0.6 


11 1 

5 

r. 

1! 

0 

0 1101 

7 

9 

16 

Pi — P 2 =* TO 43 

1 


5 

5 1 

10 

Pi ^ 0.5 


III 2 

4 ! 

6 

; Pi - 0 33 

0 

7 

9 i 

16 

j = -|.() 17 


4 

6 

10 

pj 0 40 


IV 3 

3 

6 

Pi - 0 50 

0 at)71 

7 

9 

i 19 

1 

11 

1 

0 


3 1 

7 i 

10 

pi =* 0 30 


V 4| 

~L 

f 

pz « 0 67 

0 1573 

7 1 

"9 

To' 

Pi - P 2 -0 37 


2 


10 

Pi - 0 20 


VI 5 

r 


P2 = 0.83 

0 023() 

7 ” 

9 

"iF 

Pi - P 2 =* -0 63 


1 

'9 

10 

Pi = 0 10 


VII 

6“ 

--JL 

p, = 1 0 

O.WHW 

7 

Y 

“nf 

Pi - V‘i « -0 9 


Total 





As a matter of possible interest, 'I'able 25.8 vsliows the probability of 
each of the seven combinations. Note that the seven probabilitii^s add 
to 1.0000. Because of rounding, the seven figures shown in Table 25.8 
total 0.9999. 

If we had merely been interested in knowing whether treatment No. 1 
showed a larger proportion recovering than did treatment No. 2, we 

values of sif^nifirant at tK*IecU*d probability values when ai, A^i, and N 2 are fixed. 
Provision is made for consideration of 2 X 2 tables ranging from A^i -f- =* 0 to 

Ni + N 2 « 30. See E. S. Pearson and H. O. Hartley, Biovielrika Tables for Sta- 
tiMicianSy Carnbridfie University Press, Cambridge, England, 1954, pp. 65-72 and 
188-193. The table originally appeared in two parts in Biometrikay Vol. 35, parts 1 
and 2, and Vol. 40, parts I and 2. 
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would have halved the probability arrived at by the X“ This is 

“less than 0.005” and involv^cs the assumption that the distrib\ition of 
possible valu(*s is symmetrieal, \vhi^*h is not tlu^ (*ase. The (‘orreet proba- 
bility is 0.0105, the probability shown in Table 25.8 for (‘ombination 1. 

In a praetiral situation, what sfiould one do if, handlinf>, data suc'h as 
those for the two treatments, he is confrontfMl by the' eoindusion which 
was just arrived at? Further exp(‘rimentation is ciU’fainly in order; 
possibly larger samples may result in tin; appearance of a significaant 
difTerence, or, alternativady, may still fail to discredit the hypothesis. 

correction. This eorr(‘clion, pr^-viou-ly mentioned in connection 
with the a ~ ttA' test, may also be applied to th(‘ x' 2 X 2 

tabh'd^ when skewne.^s is not present, d'he purjiosi^ is the same as 
before: to modify th(^ approximate t(‘st so that the probability resulting 
from it will be in closer agreement with the exaet test Here too, Vates’ 
corr(M‘lion tends to ovaa’-correct.'* For the data of the two tr(\atments, 
the use of Yates’ corn*ction leads to a ])r()balulity slightly larger than 
0.025, Avhich greatly exccaals that obtained by the exact iinuhod. As 
stated beioix*. flu* tendency to ovei-corretd would sometinn^s l(\ad us to 
th(' conclusion that a dilbuxuico was liot signiticant, wlu^reas the exact 
procedure would indicat(‘ t.lu presence of a signifieant difference. 

1 X li rabies, i.arger Than 1X2 

A I X 3 table. f'n\shness has been an advertised feature of various 
brands of colTee for many years. It oecui r*ed to one eoneern to at tempt to 
find out whether freshness rtailly made any (iiffen*nf‘e in the ta.'^te of 
eolTee. To that end, a fairly comprehensive in\ estigalion was under- 
taken. One aspect involved 52 tasters, each c‘-* whom was given fi cups 
of coifee -2 made from fn‘.sh eolTec, 2 made from coflVc' 3 weeks old, and 
2 made from coffee 5 weeks old. 'The tasteis were asked to match the 
duplicate cups. Now it is possible to make 15 diifenmt matchings of the 
six (Mips, Of thest‘ 15, only one involves a cornad matching of all three 
pairs. Idierc^ are six ways of having one pair correctly matched and 
eight ways of having no pairs correctly matched. It is not possilde to 
match two pairs correctly. If no difference existed in the taste of fresh, 

The corrcctioii involvr.s ruinputing \- from t;i.' < .\pression 

V ’ 41 ^ 

/. 

For purposes of computation, a simpler form is available. It is not given here because 
the use of Yatos’ correction is not reeommoiuicd. 

” See al.‘^ “Yates’ ("orreetion and the Statisticians,*' by Franz Adler, in Journal 
of the Aniftican Statistical Assoctationj December 1951, pp. 490-501. 
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moderately stale, and stale coffee, we would expect the correct matchings 
of three, one, and no i)airs to ho in the ratio 1 :6:8. 'Pahle 20.9 shows the 
observed data and the frotiiiencies computed on the basis of these pro- 
portions, From these two sets of figures, X‘ fouini to l^e 46.08. Since 
the total is set and there are three categories of sam])le data,^^ n ~ 2. 
(The distribution of X‘ for two degrees of freedom is shown in (Uiart 26.8.) 
From Appendix J it may be seen that P is niiK'h less ihaii 0.00 1, and it is 
clear that the mat(*hings differ sigiiificantly from a chance distribution. 
Apparently it is possible to differentiate between fresh and stale coffee. 
A point worth noting, however, is tluit the data wore so i)resontcd by the 
company that it Avas not pos.sil>lc to determine, when only a single ])air 
was matched, how fre(|uently the matching (*onsis(cd of lh(' two fresh 
cups, or the tAvo cups made from 3-weeks-old coffcc', or (he two cups made 
from 5- weeks-old coffee! Furth(‘rmore, the tasters did not identify the 
matched cups as “fresli,” nioderat(‘lv stale,'’ and “slah\^’ 

Other 1 X li tables. For tables having one eolumn juid more than 
three toaas of ohservral data, the jToeedure Avoiild be similar to tliat shown 
for a 1 X 3 table in d'ablc 26.9, Tlie degrees of freedom would t)e P - 1, 


TVIILF. 2.5.9 

Coniptitation nf X'/or Mutrhirtf: f»f Paii s t)f (Uips of (loffee \tnfiv 


frfutt Presh, I'lirt-v-lJ 


Nmiilw*! o] pans 

1 


corna-tiy 

f ■ 

Jr 

!:(*»: 8 

iriatrfH'd 


fhe^‘ ' 


3 .3 

Of)*’ 

2t 

2n S 

None 

' 

27 7 

3'otal 

.->2 

.32 n 


nn*l Fiiv-U Cojjvo 


-Ml 5 ' 2:, ' a? 7a 

9 3 2 , 10 2t , 0 v.i 

n 7 . 2 h; 00 ’ 7 so 

0 'ir» ns 


un!('ss the/ ainl/, values ha<l l)e<Mi ma<le to agree in rc'gard to more ehar- 
act(‘risti(’s than just the total, d’tibles having ora* row^ and (' columns are 
rarely ('ncountc‘red, because they ara? a[)t to })r. of unwirddy pro])ortions. 
Such a uii)le could bfj rraaist into a J X P tai)le. 

Te>l of ‘"goodness of fit’" as a special ease, of a J X P table. In 
Chapter 23, a. normal curv(‘ Avas tittecl to data of base})all throws for 
distance by lirst-yoar higli school girls. Columns (2) and (3) of 1’able 
26.10 sliow' the observcfl data and the eoiuputcal frequen(*i(‘s. From 
these two sets of tigurcs, x‘ i' found to be 6.<)6. Now the observaal an<^ 
the fitted data have been forced to agret^ Avith (an‘h otlier in n'gard to 
Sj and N. 33icr(U''ore, three d(‘grees of freedom Avere lost. Since the 

** Noto that thf* rAprf\‘«;sion ili ~ 1 ) ft/ — I) is not ar>pli(:?ihlr to a I X /i* tabic. 
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obHerved data are in 13 categories, we have n = 13 ~ 3 == 10. The dis- 
tribution of lor n — 10 is shown in C'hart 25.8. From Appendix J it 
is seen that F is more tlian 0.75 but les.'-* than 0.80, and we conclude that 
the agreement between the observed and computed frerjuencies is satis- 
factory; we have no reason to doubt th(‘ liypothesi.s that the sample was 
a random one from a normal population. 

r VIU.K 25.10 

CJti-Square Test of fUttnhu^ss of l*'it for Nortnnl i\n‘ve Fillftl to Htiseball 
I'lirou'S for Dislaiu e by First-Yi^nr IFi^h Si htHtl 


DistaP0(* ill f<‘ct 

; ob.si'i \ rd 
i fiC'qucin v 

J c 

<'\pc(t'*<l 
f n*fjii»'n(_'y 



i/ - 1 ) 

(1) 


; (2) 

v3) ! 

( i; 

(5; 

uo 

(jnder 25 


1 

1 1 ‘ 

-01 , 

0 01 

0 01 

25 but uridor 

Ij5 

, 2 

3 2 ' 

- 1 2 

1 11 

0 15 

!'15 but under 

45 

7 

0 1 

2 1 

1 41 

0 IS 

45 but urub r 

55 

25 

20 2 

4 8 

2,; 0 4 

1 I 4 

uiuli-r 

05 

! 5.1 

35 0 

- 2 0 

1 00 

0 11 

()5 but undfT 

75 

.55 

5(» 0 

2 1 

5 70 

0 11 

75 hut undf r 

85 

! u4 

' .57 1 

0 i; 

1 1 50 

0 7i; 

85 hut nnd«‘r 

0.) 

M 

i 52 0 

— S 0 

k '>] no 

1 23 

95 hut imdf’r 

105 

' 31 

37 0 

■ 0 0 

.)0 00 

0 or 

105 l)Ut und<‘‘' 

115 

27 

22 0 

5 0 

25 on 

1 1 1 

1 15 but nnd(‘r 

125 

1 1 

10 2 

0 8 

0 0>l 

0 HI, 

12.5 idit uiuli r 

):t“) 

4 

.> < 

0 3 

0 00 

0 02 

1 S5 ur m< »!’■ ' 


I 

1 5 

~0 5 

0 2.5 

0 17 

T<»tal 


1 ~ :io:i 

303 0 

0 


U h5 


Data from TaWU’i 1?3 I aixl t 

To .‘uoul tilt* ni.‘irJ«t'il t ipon a* <>f ‘•mall ah^oliitf (lilT«'r<‘Ti«*oH ii /“ rimJ / muia ot 1 1 i 

! . tht* end rlas‘»*'s it is not unusual t.» lUtiu}' h«*\oral a* u* <>i Imth » mis whtMi luak'tu.; a 

tost of “ (.'ooiinrss f>f fit " nf'<£iu.si* tlie <lit>tril»u1s<.n of / \ alu<>s arou.i i /, doos lad oiir-r iv t i u rr*spt>ii.i 
to t)if expiTtod (intnl.ut'oii vilu’Ti/. IS small it has dim t» rrMUjoiiimmii , tl.at no • i vs-, sfi.riM |i i\*' 
than 5 or 10 ct*ni;iul<'<l fr<‘<ju<'n< Uiiwt-M*!, it has Ison shown that if 1 1n* (' O'l v.itirK.n i- 

iisod, tho fiaj f f isiueiii lo.. nce«l nut b** thisi hir^^o S^-o W, (I ('Mohian, 1 ho ( u-’-'il'ni' fm t'ua- 

liriMitj/' hi’i'ii S/.ih' (’iilltijt Jo*tinai nf , Si if r.(i. Wd. \\ I, Xo 1, .J'll. 1'<1J t ;• -IJ f l.hl 

2X3 and Larger Tallies 

2 X R tables. For tables having two coluinns and 7? rows of observed 
<lata, it is not necessary to us(‘ a vvorksluad siicii as tliat in d'a'ole 25 7. 
Using the symbols to liava^ the meanings imlieated in the following table. 


ai 

/u 

\. 1 

a. 


i 


63 

. 

i ! 

Y, ! 


1 • ! 

s 
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the value of be computed from the expression 



From information provided by seloetive serviee n'^istrants examined 
for military service, samph* data were obtained of the number of l(*ft- 
handed and right-handed registrants who were examined in the six army 
areas. The proportions of left-handed varied from 7.S per (‘(mjI in Area 
IV to 9.2 per cent in Area It Applying a t(‘st to the data of Table 
25.11 enables us to ascertain whether the proportions of left- and right- 


TAULK 25.11 

\unibi*r of Lf*f t -ilattdeii ami Ritiht- 
Uitndvil Uvifist rants in a Santpl*'* of 
7'fiosr Hxaininetl if* Earh of 
tlio Stx .irniy ircas 


.\riny 

hrft- : 

lOnlit- ( 

Tot'll 

iiri'a 

hatuU'd i 

liaiiilcd J 

l"; 

U»i* i 

' ■ J 

1 ,05»i; ;; 

1 .707 

II 

225 i 

2 1').', , 

2,1! s 

III 

105 ■ 

2.i:w j 

2./, 2; 

JV 1 

157 ! 

I -j 

1 .71. A 

\‘ i 

250 i 

2.517 i= 

2.517 

VI 

120 1 

l.l'U 1 

1 .511 


* Yj Va ■ 

'li.no:)" 3 

12 , 1 50^ 


* i.l*‘ of tfu* n-i orrlM inoivni ‘-N 

ttie W- pait'uunt of on fntn' l'>, June 

an i Jt.ro .'ii‘ 1 

*’ D.eu fr I'll “l’rc\ alcTiM' of Left I Lvn(J«‘<)nt *''H A 

r'flintao S»'i M' «* Item'll I ari'4 ‘ fi> I*. J). Jvupii.M^ 
anti II, A, Oioaimuo //’arn.ia H imIikju, \’oI Co No 1 
r.i* Jo \ • 

Jiaixlcd dilTorf'd .‘■ijitiirirjintly in the vatioio ainiy art'iis From Ihi-f hiblr 
wp coniputp 


! 12,1 ■',())'> 

( 1, 00 l)f 11.09.')) 


(AtiD- (22:?i'' 
i 1,797 ’ 2,1 IS 


( 19:5) • 
2,32:5 


.‘5.98. 


(1.37)'“ , (230)- 
1,763 2,-647 

(l_20) ’ _ (1,0()4)'-’| 
i;5H 12,159 ) 


In order to ascertain the number of degrees of freedom, we compute 
n ^ (li 1)(C — 1) - (5)(1) 5. The distribution of X‘ for = 5 

shown in ('hart 25.8. From Appendix J we find that P is between 0,50 
and 0.70, and we conclude that the proportions of left-liandcd and right- 
handed from the six areas are not signififxantly dilTercnt, 
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For tables having C columns and two rows, th(*e\pression just used for 
may also be used, with appropriate changes of symbols. Alterna- 
tively, the table may be rearranged into two columns. 

Tables having three or more c.olunins and three or more rows, with 
marginal totals set, are most expeditiously handled by means of a 
computation form such as Table 25.7. The degrees of freedom are 
(H ~ 1)(C ~ 1). 

When making (dii-sqiiare tests, a very large probaVjility may occa- 
sionally appear. Some writers have pointed out that a prob^jbilit y of 0.90 
is just as unusual as 0 01, and that, if we were to consider 0.01 as dis- 
i‘reditiiig a hypothesis, lluni 0.90 ju.st clearly fliscredits a hypothe.sis as 
floes a probability of 0.01. It is tiuf^ that an ocnirnaiee having a proba- 
bility of 0.99 is just as surpri.sing as an occtjrr(-iu*e liax’iiii:' a probability 
of 0,01, but it do(‘s not follow tluit a probability of 0.99 di.^credits lh«' 
h>qmth('sis. The startling agnaam-nt bc^twi ;u sainph- and po[)ulatioii 
or between two samples should lead us to look, more carefully than usual, 
for possibly ^‘rigged” data, for-arilluntdif* nustak(\'^, for prmious smooth- 
ing ol U.ti data if ‘^goofl?K‘s.s of tit'’ is involved, or for a carelessly de.sign»‘d 
expiu’inuuit. 

As a inattfu* f)f fact, (a’tlu'r extremely large or surprisingly .small values 
of I* slioulfl cause us fo re-exar.iine the situation, ('onsidcr th(^ following 
m{*iflent which was mcnti«:)ned on p‘'g(‘ 12: Wdu'n lluorescent lighting wa.s 
first introduced, some persons behevfMi that radiation from the liglits 
would sterilize people. Hoping to allay their fears, a railroad, whifdi had 
already instalhal tin* lights, .subjected one group of rats to incandescent 
light and a seeond group to fluore.scent light. The first group had the 
usual numlicr of offspring, the wsecomi group Iiad none. This seemed, 
in<I(M»d, to reinforce the fears of those who thoc:Jit that the lluoresf’ent 
lights might sterilize. Idie result seemed so surprising that one cx(‘eutive 
asked that the second group of rats be can^full}^ cheeked. Upon exami- 
nation, they wore found to be all of the same sex. 


A discu.^sion appears in “Too Good to Be True,” by Alan Stuart, Applied Sta- 
tislics, March 1951, pp. 29-32, 
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Variances 



(r : the inean 

k: nuinher of 

L: the uitio ot the ^I'unirtiu mean of se\eral varianecs to their arithmetic 
mean. 

rt: d'O^frecs of freedom 

« 3 , • ■ ■ : re.speeti\ (‘ly. do^frees of ficedom HI sampl(‘^ 1 , 2, 3, • • ' . 
tik refers to the miinber of deforces of fKvdom Jii th(‘ L'ih sample. 

.V: number of items in a sample 

*Vi, ‘Vi, A\o, • • * : respeetivoly, number of items in ^ampl< s 1 , 2. 3, ■ * * . 

Xk refers to the number of items in the A*’th sampu' 

A\: used in eonneetion ^^ith L to induaite the number ot iifuns m any one 
of several samples of equal sue* 

P\ probability, varies from 0 to 1. 
s^: the variance of a sample 
sj: the variance of sample I. 
si. the v'arian<*e of s^nqile 2, 
cr^: population variance. 
cr]: the louer confidence limit of 
cr], the uj)per corifitJence limit of cr^ 

O'*, the (‘"'tirnated variance of a po[>ulation obtained a saiTiple. 

S’s, ' * ' : r<\specti\eiy, (^timatis ol p(;pul ition variams' from 
sample^ I, 2, 3, ■ ' * . S’*. r(f<*is to the estimat(' fiom the A’th sample. 

S: upper-(‘ase Greek sigma, meaning ‘‘take the sum of.” 

■“ 

i’l* a deviation of a value in sample 1 from A ,, L'r'J ^{Xi — A])-. 

a de\)ation of a value in samph* 2 fri>m X>, SfVa — A’ 2 )^. 

X] the ariflirnetis- mean of 'Unqilc* I 
A "2 th(^ arithmetic mean ol ^iiinph* 2 

XA (‘ljapt<T 2o. Th(‘ symbol is a lower-case Greek chi. 

■y* infini/v 

\ nul> 'ni - of \ urVuitre 

f : the ratio of two estimates of 
kt: the number of bo\es. 
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kc'> the number of columns. 
kr'. the number of rows. 
n\ degrees of freedom. 

n\: degrees of freedom associated with the numerator of F. 

712 ' degrees of freedom associated with the* denominator of F, 

N: number of items in all rows, all columns, or all boxes. 
iVt! number of items in a box. 

Nc: number of items in a column. 

Nr', number of items in a row. 

Niy Nzy • • ■ : respectively, the number of items in Columns 1, 2 

3, • • • . 

P: probability ; varic's from 0 to I. 

v 

estimale of }'>opulation variance using i.'(.Y — X}'\ 

1 

2^: upper-cas(‘ (Jreek sigma, meaning “take the sum of.** 

kh 

2: a summation ov'er the kt boxes. 

1 


2: a summation over the kc columns. 
1 


hr 

2; a .summation over the rows. 

i 

.V 

2: a summation over all items. Same as 2. 

1 

Nb 

2: a s\irnmation cner the Nt, items in a box. 

1 

*Vo 

2: a summation over the items in a eohman. 
1 


N, 

2: a suinmaticm over tin* Nr items in a row. 

i 

i: see (Chapter 21. t VF when ^ 1. 

X: an observed value. 

X: the arithmetic mean of all the it(*mvS, the “grand mean.’^ 

Xb\ the arithmetic mean of a box. 

Xc'. the arithmetic mean of a column. 

^r' the arithmetic mean of a row. 

Xi, Y2, X3, • • • : respectively the arithmetic means of Columns 1, 2, 


3, * • . 

X** chi-square; see Chapter 25. 


X" 


n 


F when N^ 
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Skewness and Kurtosis 

01 : lowor-oaso Greek beta ; measure of skewness in a vsample. See Chapter 

10 . 

jSo: lower-case Greek beta; measure of kurtosis in a sample. See Chapter 
10. 

A’': number of items in a sample. 

(Correlation Coeffieienls 

b: slope of the estimating (‘quation \\ -■=- a -f /nV. 

F: a ratio between two estimaled variances. 

Vr.x’ lower-case Greek eta, tlic sfpiare of tin* (Correlation ratio based on 
column means (see Chaptin* 20): sometimes refeiT(‘d to as ilu' '‘ratio of 
determination.’^ 

lower-case Greek eta; i>o])iilation estimate of tiv.k- 
m: number of constants in an t'stimating c(|uation. For tin' correlation 
ratio 171. A, m is the number of columns, 
n: degrees of freedom. 

rii and n-;: respectivc'ly, degrc'cs of tnaalom associated with tlie numerator 
and the denominator of F. 

N: number of iU'rns in a .‘-amph' In tw()-\'anable liiK'ar or non-linear 
correlation, A' is the number of pairs of items, Tn multi[df‘ or ptirfia! 
correlation, A' is the nnrntu'r of s(*ts of observations 
A'l and respectively, the nuinher of jyairs of it('ms from whi('h ri and 
r2 were com])uted. 

P: probability; \'aries fiorn 0 to 1. 

r\ sample coenicienl of correlation, linear corn'lat ion f)f two varialih',^. 
When two samples are under con-nleration, w(* use n and Cj 
population eoefficicnt of correlation, linear correlation of two variables. 
Ty/. lower confidence limit of a,.. 

upper confid^'iice limit (if r.i- 
P: estimated \aalu(' (jf r~>: oht aim'd from a sam[)le. 

co(*tficient of [uiitial d(‘t('rmination. Sve Gliapt(T 21 

„ 1,: a general form f)f the cr)eflicient of partial di'termination for 
in varialdcs 

I ; estimated [yopulatnin ^'alue of ,,, i,. 

ri? M.ry ji, rj\ ^ the three forms of the coefficient of partial determination 
for four variables, when A'l is the dt'pt'iuhmt \'arial>lo. 

■ coefficient of partial determination ; the additional variation in Y 
explained by A^-, exfiressc'd as a prijportioii of the variation in Y which 
was unexplained liv A'. 
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coefficient of determination for .V and K, l^he estimating; equatioii 
Yc — a + bX + cX^ having been used. 

^l.xx^’ population estimate of 

coefficient of partial determination; the additional variation in 
Y explained by Y®, expressed as a proportion of tin* variation in ]' 
which was unexplained by X and 

'^'y.xx^x^’ coefficient of determination for A" and the estimating equation 
Yc = a + bX + cA"- -h dX^ having been us<‘d. 
rl,xx^x^‘ population estimate of 

coefficient of multiple determination; the proportion of variation 
in A"i whi(‘h was explained by and .V,. 

coefficitmt of multiple (h'termination • the projiortion of variation 
in Ai which was explained by A%, A.^. and .V,,. 

^^o.234 -m' ^ general form of the r oetlicient of multiph^ ([('termination f(n’ 
m variables. 

/m. 234 -m - estimated population vahn^ of 
i^y: *-ovJ. variance of the V serie.s. 

.Syyi the square of the standard (*iTor of (\stirnat(‘ for the (‘slimating 
equation - a h bX\ inu-xplaiiH'd varunn'e 
0*^: estimated variariee in a p()j)ulatiori. 

: ('stimated population variance (total varianc<0 of tlie Y series. 
v.Y* population (?stimat(* of th(‘ unexplanu'd \ariance resulting from use 
of the estimating cciuation Yc ~ u + bX. 

(j^ \ standard error of .r. 

(T,,..,,: standard error of Z] — zn. 

X \ upper-case Ureek sigma, meaning ^‘take the sum of 
2.rJ: total variation in the Xi senes. 

explained variation resulting from us(‘ of the estimating equation 
A%i .23 -- di.u "b bi 2..\\2 + ?A 

2.r,“i 234: explained variation re.sulting ln>m use of the (\stimating ecpiation 

A*c 1.234 ~ ^ 1 . 2 'M + 612 .hA 2 + /q.^o^A ] + ?> 14 . 2 >V.,. 

geiK'ral form, explained variation resulting from use of the 
estimating equation A%i.234. -m = U] 2..4 . u. mAj +-61321 ,..V, 

+■ bj 4 23 • • mX 4 +-•••+- him 23 (.„ p A 

2^7,034. .(rn^i)' explained variation n'subing from use of the esti- 
mating equation A’'ci.234.. =- Ui 2u (v. o f 612.34 .(m-pA+ + 

6 i 3 , 24 . . (ffi- 1 iA 3 +- 6 ] 4.23 . . . (m-l) A'4 L . . . 6^ . . (,n- 2 i A f _ p . 

^.rji o.i: unexplaiiu'd variation n'sulting from use of the (estimating ecpia- 
tiun shown for 2.rAo3. 

1.2 14- un(>x])lained variation resulting from use of the estimating equa- 
tion shown for ..,,4. 
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a general form; unexplained variation resulting from use of 
the estimating equation shown for 2 x^i .234 . .m- 

2 x^ 1 . 234 ... (rn-i/. unexplained variation resulting from use of the esti- 
mating equation shown for 2 x^ 1 234 .. 
total variation of the Y series. 

St/J: explained variation resulting from use of the estimating e<piation 
Yc-= a + hX. 

^UcY.xx*- explained variation resulting from use of the estimating e(iua- 
tion Yc = a + bX + e.Y-. 

explaim-d variation resulting from nsc of the estimating etjua- 
tion F, = n + hX -f r.Y^ -f- dXK 

Sy*: unexplained variation resulting from use of the estimating equation 

F, = tt + hX. 

2y;v. unexplained variation resulting from use of the estirnali/ig equa- 


tion Yc - a + bX -f cX\ 

^UsY.xxKx*' unexplained variation resulting from use of the estimating 
equation Yc = a + bX -h cX^ -f 
!r-iX — rn) . 

P* x!" : ~ ’ or an equivalent expression (see note 15). r- may be 

^ 1 — r' 

cither a two-variable linear coeffieiont of determination or a partial 
coefficient of determination. 


a deviation divided by iis standard error; for example, - or 

' <r, ' 

A : an observed value in the A" series; also, the A' series 
A'l, Xo, A 3 , X 4 , • • ’ : resfiectively, tin'. Ad, A'o, A' 3 , Ad, * * * series; also, 
observed values in tln-isc! series. Thus, w(‘ may refer to correlating Ad 
with A" 2 , A" 3 , and Ad, but 2Ad means “ lak‘‘ the sum of the values in the 
Ad series.” 

X: the arithnietie mean of tin' A^ series. 

2 /: y ~ 

yc- Yc — K. See also '^yl and '^ijl with additional .'^u]>s(Tip(s. 
y*; F — Yc- Sec also Xyl and 2//^ with aflditionnl subscripts. 

F: an observed value in the }' series; also, the F .scales. 

F: the arithrnetie mc'aii of the series. 

F^: a computed F value. 

1 + r 

z: 1.15129 log ^ . Wl]('n two samples are under con.sideration, wc use 

Zi and <2 to correspond to ri and r 2 . 

1 + r(? 


2 (p: 1 . 15120 Jog 


I -- 


Z(p,: lower confidence limit of zc- 
^(?t* ^pper confidence limit of Z(y. 



CHAPTER 26 


Statistical Significance III: 
Variances, Analysis of Variance, 
Measures of Skewness and Kurtosis, 
and Correlation Coefficients 


In tiiia, the last chapter of the book, we shall give attoation to variances 
computed from samples, ttie variance of several means (analy^^is of 
variance), vahies of j3i and (^2 obtained from samples, and correlation 
coefhcients. 

VAIilANCES 

Our consideration of sample variances, a*-, will parallel the treatment 
of arithmetic means and proportions in that we shall first consider the 
difference between and next we shall obtain confidence limits of a-; 
and then we shall compare' tw'o sample varia* <’cs. In addition, we shall 
give attention to one way of comparing severe) sample variances. 

V’ariancos of random samples from a normal population are distributed 
iKMther normally nor symmetrically. Their distribution follows a skewed 
curve (skewed to the right), the exact shape of whiidi depends upon o*- 
jind N. SiiH.e tables giving valuer of for several values of P would 
have (o have both cr- and N as arguments, and would 1hen*fore be very 
extensive, it is fortunate that (A' — follows the chi~s(iuare 

distribution for N — i degrees of freedom. Thus, we write 


In the event that is given, rather than a-, wx ma}- obtain from the 
expression 


A" 
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Alternatively, we may apply the x* t®st in the form 




or® 


with n = AT — I for X‘- 

Significance of the difference between and (T^* Below Table 
24.1 it may be seen that the value of for 10 pieces of hard-drawn copper 
wire was 75.73. In this case, as in most others, we do not know the value 
of O’®, but, for purposes of illustration, we shall assume that (X^ = 46.42 
and test the hypothesis that 3" = 75.73 is the variance of a random 
sample from a population having (r- — 46.42. We shall use 0.05 as our 
criterion. Computing x^ 




(Nj- 1)32 


— , 


(9)(^.p) 

46.42 


14.683 


for n == .V - 1 == 9. From the x^ Table of Appendix J, it is seen that, if 
(X- — 46.42, the probability of obtaining 3^ = 75.73 or larger, for samples 
of 10, is almost exactly 0.10. Our hypoth(»sis is not discredited. Note 
that, in tliis application, X“ has provided us with a one-tail test, since tlie 
probability which was obtained refers to values of 3^ equal to or larger 
than that observed. 

If wii are interested iji considering values of 3^ which are less than crq 
more than one avenue of approach is open to us. We may ascertain the 
probability of a value of 3^ .showing the same absolute difference, but in 
the oppositi^ dirt‘ction. That is, 3^ ~ J7.ll. Alternatively, wc may 
(letfjrmine the value of 3^ w'hich cuts off the lower 10 jxt cent tail ot tlio 
distribution of x" ^tir n = 9. Considering these two, in turn, we find 
that, w’hen a- = 17.11, 


2 ^ 

^ ■ 40.42 


3.317, 


aiifl the probability is about 0.05 that values of 5'^ etiual to or smaller than 
17.11 would occur. The value of wliich ruts off the lower 10 per rent 
tail of the distribution of x’ is obtained by using the x® value for = 0.90 
when n — 9 in Appendi.x ,1. 7'his is 4.168, and w'c write 


4.168 


9 ^ 

46r42’ 


9^^ = 193.47856, 


d* = 21.. 50. 
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The fact that the test involves the ratio of to cr^ may have already 
suggested to the reader that, when n ~ 9 and when = 14.684 (the 
value of x^ at the upper 0.10 point), the resulting probability of 0.10 may 
refer to any pair of values for a- and cr^ giving the ratio 14.684 9 = 

2 

1.632. Whenever “ = 1.632, the value of will be at the upper 0.10 
point. In symbols,^ 


71 


(T- 


and from this relationship tin* tablr of Appendix K was prepared. This 
table enal)]e.s oik' to ('ompute sampling limjts of merely by ilividing 
3^' by cr-, thus making it unnecessary to compute X"- the preceding 

illustration, where O'" — 17.11 anda “ the ratio is 0.3686. Look- 

ing uj) this ratio in Appendix K for n 9 gives a probability (lower j)oint) 
r‘f ’-'u1 0.05, the same as obtained before. 

Confidence limits of o'-. We may also on^ploy x“ to ol)tain the 
confidence limits of <j'\ For the data of hard-drawn copper wire, O'- = 
75.73 and N == lO, What are the 90 per cent* contidiaicc limits of cr-*' 
To answer this (piestion, \N(» use two chi-square values from Appendix .1 
for 71 -- 9: one at the iippei 0.05 point and one at the lower 0.05 point (the 
0.95 point in Appendix 4). These values are 16.919 and 3.325, and 

71 ( 1 - 

we solve X“ = " 'o for«^^: 


and 


IGT/LJ 


(9) (75 73) 


16.9 1 9(7 j - 681.57, 
(r\ - 40 28, 


3..'i26 . 

( 7 ., 

3.325(7.^ - 681.57. 
( 7 :> - 205.0. 


The 90 per cent confidence limiis of (T- are 40. 2S .and 205.0. As before, 
if wo compute many such 90 i r cent limits from random samples from 
a normal [)opuiation, our statements will include the population value 
90 per cent of the time and fail to include it 10 per cent of the time. 

(T* X2 

1 The ratio — ; is a spe cial case of F (see page 720) when ?i 7 = 

cr* n 
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Rodger P. Doyle computed the 90 per cent confidence limits* of tr* for 
each of Shewhart’s 1,000 sample.s from a normal population. His limits 
included cr* in 904 instances but did not do so for 90 of the samples. 

We may recast the expression 


to read 


X" - 


na* 

or* 



71 

X’ 


to enable us to make a table from which to obtain the confidence limits of 
<r*. Such a table is given as Appendix L. I'slng it to get <ho 90 per cent 
confidence limits of (T'\ when n - 9. which were just olnained by use of 
X% we would (’ompute 

<TJ - O.fidlOiT' - (0.:)319)(7rj 73). 

- 40.2S, 

and 

a] 2.7070'- -- (2.707) (70 73), 

- 205.0. 


Significance of the <!iflPerericc belwecui two sample variances. 

In Chapter 24 we considered th(^ significance of the differeiKM' [letwi'en the 

mean lengths of two .'^ets of lower first molars which liad It), ,si 

0.72, xVs — 9, arul S 2 0.t)2. We previously found that lh(‘re was notui 

significant duferenee betwecui A, and .Vo. Csing the 0.05 h'vt'l as our 

entorion, let us now t(;st the hypo! he^r-, th;il the two samples were from 

the same population in respe(*t to cr". 

When and ai are independent estimates of o’- from the same normal 
^ > 

(TJ . . 

population, their ratio i.s di.stributed ae^’ording to the F distribution 
cr; 

with til “ 1 arid ?h V- — 1 degrees of freridoin. If dl, the 

value of F is l.O. \'alues of F vary from 0 f-o 0.999 • • • when a] < dl 
and from i.OOt) * • ■ 1 to x W'hen al > The F distribution Ls 
“reverse — J shaped when nx ~ 1 or n, “ 2 and skrnvecl to the right when 
rii ^ 3. Several F dislributimis are shown in ('hart 2().l. 

For the data of lower first molars we found, in C'liapter 24, SxJ = 8.29 

* From unpublished rriab*ria!. 
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HEIGHT OF OROIMATE 



('.liat't 26 J . ihiitioii ul* F for n. " 1, m ~ 5; n\ -- 2, n> = 5; ami 

7ij -'■ 3, n> ~ 1. llori/oiitiil an<i vtTticRl s<'al' s cxfonii tc x 'I'lu' onhiiutO'! of tfic 
F (lijitrihutioii arc ohtaiiii il from llic c\j>i(‘'-sion 


Fc - ' 


n\ - « 

F 


{n\F -f- 


///i -t' n- - 2\ , 

yJ'A: /hi - 2\ //oj - 2\ 

(- 2 )■( 2 )’ 


* and 2.^2 =■- 3 ‘10. ('on.s''(|iK'iillv, 






S.29 


.V, - I U) - 


== ()..iri3, 


and 


■- . - 0.432, 

■ A . - 1 '.) - 1 


0.432 


with 15 and = 8. Value's of F for selected values of ?h and Jh^ 
and for probabilities of O.IO, 0.05, 0.025, 0.01, and 0 001 in the right tail 
^ of (hi distribution are given in Appendix M. Heferriiig to that appendix, 
we find that n\ 15 is not given, but jii = 12 ^and /h = 21 are given, and 
so is no - 9, It is not necessary to interpolate for hi = 15, vsince the 
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probability of F ^ 1.28 exceeds 0.10 whether we consider == 12 and 
712 = 8 ur 24 and fhj = 8 . The observed value of does not si^i:nif- 
icantly exceed the observed value of o'^ . But what about differences in 
the reverse direction? 

If < 7 ] had been 0.432 with Ni ~ 16 and oi had been 0.553 with N 2 = 9, 
then we would have 




0“ 


0,432 

0553 


0.781, 


with rii =■- 15 and z?-.. ~ 8 . Now, the table of Appendix M does not include 
any F values smaller than 1 . 0 . Wlien a value oi F is less than one, 
we can obtain the pr<d)ability'* of that F value or less by eomputin«>; 

wliich will exceed 1.0, and rever.se the degrees of freedom. That is, we 
t 

would look up 


0.-781 


With = 8 and ~ 15. this, wr find that tlu* prohability of 

F 1.28 v/hrn /o ~ 8 and ^ 15 more than 0 10: thoia'ton^. th(' j»n>b- 
ii])i!ity is aUo more than 0 10 lor a \a!iie of F ''' 0 781 with ?/, ■- 15 and 
ru - 8, 

Comparison of several values of Sometimes it is important to 
know whether uniformity i^xists f)etw(a*n several values of a". A pencil 
manutat-turing (*oncern made tests of tlui strength of the lead of their own 
peneiLs and of pencils manufact un^d by tive of their competitors. The 
te.sTs inehidtal five peiicil.s of each hardii(‘ss, 1 , 2. 2.5, 3, and 4, from each 
of the six companies. Kacdi individual pencil was tested four times. 

For five Number 2 pencils, made by a company which we shall call 
^'C’ompany D,” the t(\sts’ showed = 0 01316, ~ 0.05667, d"!; = 

0.027<S7. O'] - 0.01030. al - 0.01520. - ^2 - nI - .V 4 - .V 5 - 4. 

One way to company Ihc^se variances would be to compute F for o'] and 
o'.j, for O'] and 3"J, and so on Another pro(‘edure involves comparing all 
of the values at oma^ l)y means of tlu* measure-' L, sometimes referred 
t(j as a erit(*ri(>ri of likelihood. 


^ ,\n ahhrov lilted tiihie, prcpanMl hy the aiithors of tliis volume and sliowing both 
uppfT jiFiil lo\\<T points, may hi- found in F K. Croxton, ElctruntaTu *S/un.s/a:s with 
Apfdicuttoniy in Miduiht', Fn-ntico-Hfdl, Inc., New York, 105:i, pp. IIS 4 1135 
‘The lest data arc siiown in Table 20,3 

* Sc*' J. Ncymari and F S Pearson, “On the Problem of k Sample's,” Akadomija 
Umi(‘jctnosei, Hullrlin I niernntionnl de V A cadent e, Eolonatse des Srif^nces et den Leilres^ 
S(Tic A, Sricru'cs Matti('mat)(pn's, 1U3I. pp. *1(10 181. 
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7 _ All, 

- (S'i + &2 -h • ■ ■ + 

if A^i — N2 — • • * ~ A";t. If the samples include varying numbers of 
i terns, 

, ^ (&'in X X • • • X (fffc-)"* 

^^==1 ’ 

“ (ni^l + fioc:. + * • ' + iikdi) 
n 

where w - ni + //2 + ■ ’ ' + n*-. The numerator is the geometri*'' 
mean of Ihoo'-’s whil(‘ the denominator is the arithmetic mean of theij-’s. 

already know (C’liapter 9) that the gt^ometric mean of a series of 
values, which arc not all the isamo, is smaller than the arithmetic mean of 
those values. Also, the more divergent the values, the greater the 
ditTerenee between G and X, Now, \{a\ cl = • • • - cl, aconilition 
of maximum uniformity obtains, and tlie value of /. is 1.0. If there is any 
ililTerenee betAveen thea-’s, the value of L will be less than 1.0. aj)})roaeh- 
ing 0 as its lower limit. L = 0 repre.^ents a condition of maximum 
non-uniformity and is a theonuieal limit which would not h(‘ approached 
in actual praet k'c 

Computing L for the five Xurnher 2 pencils made by (Vnnpany 1) gives 

v''()()i:?lG X OO.iGG: X 0 027S7 X O.Ol'.W) X 0.01.529 
'‘.liOOGllG + 0 05GG7 -f 0 027S7 }- 0 01930 + 0.01529)’ 
0.0227.S 

" ' 0 8b. 

e 0 02(Ub 

It would ap[)ear, sini'c 0 SO is not far removed from I.O, tliat uniformity 
exist.s among the five values of c\ However, we want to know whether 
h = 0.86 difTers significantly from 1.0. The hypothesi.s to be tested is 
that the five variances were from random samples from the same popula- 
tion in regard to <r". The distribution of L, foi samples drawn from a 
normal population, is J-shaped, as shown by the small chart above 
A])pendix N. This appendix gives values of L at the 0.05 and 0.01 points 
for various values of N, Aand k, where A, refers to the number of items in 
any one of the sami)les of equal '^c. For our problem, A, = 4 and 
k 5, and, from Appendix N, it is seen that L = 0.491 is at the 0.05 
point wliile L = 0.370 is at the 0.01 point. It is clear that the observed 
value of L - 0.80 does not differ significantly from 1.0; the hypothesis is 
not discredited. 

Values of L were computed for the variances of Number 2 pencils made 
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by each of the other five companies. In one instance, L == 0.30 with 
Ni = 4 and = 5 as before. This value for L is beyond the O.Ol point 
and would be considered significantly different from 1.0. 


ANALYSIS OF VARIANCE 


In Chapter 24 we considered the significance of the difference between 
two means. The discussion of analysis of variance, Avhich follows, deals 
with two or more means. In its simplest aspect, analysis of variance will 
have to do with two independent estimates of O'- which will be compared 
with each other by means of F. 

One criterion of classification. In Table 26.1, data are shown of 
the length of eggs of the European cuckoo found in the nests of three 
other si)ecies of birds. The Ihiropean cuckoo makes a practice of per- 
mitting other birds to hatch its eggs and rear its offspring. We are 
interested in knowing whetlier the mean lengths of cuckoo eggs found in 
the nests of the hedge-sparrow, the robin, and the wren are significantly 
different from each other. We shall not compare the first mean with the 
second, tlu‘ first with the third, and the second with the third. We shall 
consider the three means as a group, comparing the (\stirnated variance of 
those three means (one estimate of the variance in tlie population) with 
the estimated variance within the thre<* columns (a second estimate? of 
the variance in the population). 

The data of Table 26.1 are classified according to one criterion: tlie 
species of bird in which the.cuckoo’s eggs were found. For such a table, 
there are three sources of variation. 

1. Variation between column means. The variation between column 
means is obtained by taking the differences between each column mean 
(Xi, X 2 , X.i, • • ’) and the “grand mean'' (X, the arithmetic mean of all 
the values), squaring each diffennice, multiplying each squared difference 
by the number of items in the appropriate column (A\, iV 2 , • • •), 
and summing. Symbolically, this is 

v,(Xi - Xy -f N,(X, X)'^ + N,{X^ ~ Xy + • • • . 


Using Xe to indicate a column mean, Nc the number of itenns in a column, 
and kc the number of columns, variation between column means may be 
written 


1 

kt 

where 2 indicates that a summation over the kr columns is to be made. 
1 

The expression just given calls for the computation of k^ column means 
and the grand mean. This is not necessary, as it is shown in Appendix 8, 
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TABLE 26.1 


Compttiaiion oj Values Required for Analysis of Variance of Data 
of Lenfith of (Mekong's Eggs Found in the Nests of Three Species 

of Itirds 


llf'Og 

p-sparrow 

Robin 

i \V ren 

X. 

Y2 

X, 

X1 

X, 

I -'3 

22.0 

484 00 

21 8 

475.24 

19 8 

362 04 

23.9 

.571.21 

23.0 

529 00 

22 1 

488 a 

20.9 

436 81 

23 3 

512 89 

21 5 

462 25 

23 8 

500, 14 

22 4 

501.70 

20 9 

4:16 81 

25 0 

025.00 

22 1 

501 70 

22 0 

481 00 

24 0 

57i> 00 

23 0 

529 00 

21 0 

4 a 00 

21 7 

470 80 

2;i 0 

529 00 

22 3 

497 29 

23 8 

50(> 14 

23 0 

529 00 

21.0 

111 00 

22.8 

519 SI 

23 9 

571 21 

20 3 

412 09 

23 1 

533 01 

22 3 

197 29 

20 9 

130 81 

23 1 i 

533 01 1 

22 0 

: 484 00 

22 0 1 

481 f)0 

23 5 

552 25 

22 (i 

1 5i() 70 

20 0 i 

400 00 

23-0 

529 00 

22 0 

184 00 

20 8 ; 

432 04 


529 00 

1 22 1 

48cS 41 

21 2 1 

449 44 



21 I 

415 21 

21 0 1 

141 00 



23 0 

529 fH; 



"323 'ii 

7,491 10 i 

3(>0' 9 ’ 

8. 117 53 i 

:iU)'s " 1 

0,09S 78 


Datft frfjti) <)3w.al(l H, Latter, ‘ riie of Codilus CantiruH," fivjm^tftka, Vol. I, 
173 . 

.V - 45 

iw - :!2;!.c + ‘j 4- .'ne.s = i,ooi.:i. 

= 1,002,001.60. 


k, 

V 

1 



= 7,494.10 4- 8,147,53 -f 0,098.78 =■-* 22,310 11, 


(323.0)2 (300.9)2 (310.8)2 

14 ' 16 ’ + ---j r,' * 


22,11! 1 1495. 


p. 


section 20.1, that* 


mdZ 


1 


xyj - 



(S.Y)* 


•If A', = A^, = A', = 


may be written 




• • , the oxpres.sion 



iV. 
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AT, 

where 2 refers to a summation of the Nc items in a column and N = 

i 

*Vi + + A^. From the computations shown below I'ablc 26.1, 

r/^« \2j 

1,002, 601. 69 


Cf'-)’ 




= 22 , 311.15 


N 


45 


= 22 , 311.15 - 22 , 280 . 04 , 
= 31 . 11 . 


2. Variation within columns. Variations within columns is the varia- 
tion of the values in the columns from the column means. It is obtained 
by taking 'the differenre between each item in a column and the column 
mean, squaring the differimces, vSiimining the .s(|iiared dilTerences for the 
column, performing the same operations for the other columns, and 
summing the sums for the columns. Symbolically, variation within 
columns is 


kc 

2 



This expression involves the computation of k, c('>]umn means and the 
determination of .V differences. These operations are uiineti'ssary, sinc(' 
Appendix S, section 26.2 shows that 






k. 

N. 

kc 

-.Y 

2 

1 

X(X ~ 

.. 1 - 

= 2 - 2 
i 

\ 1 / 


and, again referring to the computations below 'i'able 26.1, wc find 


2X2 „ 2 


1 



- 22 , 310.11 - 22 , 311 . 10 , 
= 29 . 26 . 


3. Total variation. T otal variation is the sum of the squan^l devial ions 
of all the values from the grand mean. It is the same as A.s"', where ,s is 
the standard deviation, which was explained in Chapter 10. Symboli- 
cally, total variation is 

N 

2(X ~ X)K 

1 

It ivS not necessary to obtain the V deviations called for by this expression, 
since, by a procedure similar to that shown in Appendix S, section 10.2, 
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it may be shown that 

v(,Y - .?). - SA-. - ea’ 

1 N 


For the cuckoo-ogp; data, 




(2A) = 
N " 


22,340.41 

22,340.41 

60.37. 


1,002,601.69 

45 

22,280.04, 


Notice that the sum of the first two values which we obtained equals the 
third value. That is: variation between column means + variation 
within columns = total variation. This is true for all problems such as 
this, since 



As will be seen later, no use will be made of the numerical value for 
total variation. Nevertheless, it is w’ell to compute it as a chock on the 
other values. 

Estimated variances. It is our objective to compare the estimated 
variance between column means with the estimated variance within 
columns in order to ascertain \vhether the column means differ more than 
might be accounted fox by chance. The estimated variance within 
columns is our yardstick of chance variance, since the variation of the 
items in the columns is not affected by differences between Ai, 

X.i, • • * . Kstimated variance is obtained from variation by dividing 
variation by the appropriate number of degrees of freedom. For our 
problem, estimated variance between column means has n = 2, since the 
deviations of the three column means were taken from X. For estimated 
variance within columns, n — iV^i— 1 + ^ 2 — 1 = 14—1 + 

16 — 1 + 15 — 1 = 42, since the deviations in each column were taken 
from the column mean. 

The computation of the estimated variances is indicated in Table 26.2, 
and from these avc get 


50 

0.6967 


= 22.3, 


with ni = 2 and = 42. The F table of Appendix M does not contain 
a row for = 42, but it is, nevertheless, clear that the probability of 
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getting F § 22.3 is much less than 0.001, and we conclude that there is a 
real difference between the mean lengths of the eggs found in the nests of 
the three species of birds,^ It is of interest that later non-statistical 
investigations revealed that European cuckoos exhibit what is known as 
host specificity,^ which means that ‘different tribes, or gentes, exist 
within the species, even in the same area, each adherent to a different 
host species and each specialized in at least one respect for that one 
species.^' 

TABLE 26.2 


Summary a/ Computations for Analysis of Variance of Data of 
Length of Cuckoo* s Eggs 


Source of variation 


Between column means 

Within columns 

Total 


Amount 

Degrees 

of 

of 

variation 

freedom 



29.26 

42 

60.37 

44 


lOsti mated 
variance 

T5.56 

0 6967 


The hypothesis which we tested was that the estimated variance 
between column means and the estimated variance within columns were 
from the same population with respect to cr^ The hypothesis was dis- 
credited. If a sample is drawn from a normal homogeneous population, 
we could expect the two estimated variances just mentioned and (an 
estimate based on total variation) to be equally good estimates of cr'K 
But if heterogeneity is present, as it w^as in our illustration, the estimated 
variance between column means and are both affected by that hetero- 
geneity. Estimated variance within columns is not affected, and there- 
fore provided our measure of chance variance. 

The F test for the data of length of cuckooes eggs involved a situation 
in which rii = 2 and n 2 = 42. If we had had two columns of observed 
data in Table 26.1, instead of three columns, rii would have been 1 and our 
problem would have been that of testing the significance of the difference 
between and X 2 , which was considered in Chapter 24. In fact 
whenever an estitnated variance has Hi = 1 in an F test, the t test is an 
alternative which yields the same probability. This will be clear if we 
look at Appendices I and M. From these it may be seen that, for any 
given probability, the value for is the same as the value for F when n 
for t equals 712 for F and when ni for F is 1. An instance in which the 


’ L. H. C. Tippett comes to the same conclusion using data of cuckoo’s eggs in the 
nests of six species of birds. See his The Methods of Statistics, Williams and Norgate, 
Ltd., London, 1937, 2nd Ed., pp. 132-134. 

* S^e '‘Social Parasites Among Birds,'* by Alden H. Miller, The Scientific Monthly, 
Vol. LXII, p. 243. 
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i-test could be used in place of F occurs in the test of the estimated vari- 
ance between column means shown in Table 26.6. 

Two criteria of classification, one entry in each box. The data 
of Table 26.1 ha-d but one criteria of classification, the type of nest in 
which the cuckooes eggs were found. In Table 26.3 there are two criteria 

TABLE 26..^ 


Compntaiion o/ Valutas Required for Analysis of Variance of Data of Strength 
of Lead in j\umber 2 PemALs Manufaci ured by Company 

A, Ohservod data, in kilograms, and sums. 


liocation 
of test 
on pencil 

Pencil 1 
X, 

Pencil 2 
i A 2 

Pencil 3 
A 3 

Pencil 4 

A’ 4 

Pencil 5 
As 

Nr 

1 

(p)’ 

I 

1.82 

1 70 

1.70 

1 .82 

1 92 

8 96 

80 2816 

11 

1 56 

1.36 

1 (18 

1.98 

1 80 

8 44 

71 2336 

III 

1,78 

1.54 

2 02 

1 82 

1.64 

8 80 

77 . 4400 

IV 

1.74 

1 92 

1.92 

1 (14 

1 75 

8 97 

80.4609 

AT. 

2X 

6.90 

6.52 

”7.32 

7.26 

"7.rr 

'*35 17 
XX 

1 309 4161 

V ( X^ \ 

1 







T \*\ ) 


Dtit» from of pencils of various brands conducted in 1934 for the Kagle Pencil Co. 


B. Squares of observed data and sum. 


Location j 
of teat 
on pencil j 

A 

XI 


^5 

x\ 

Total 

i 

3 3124 

2.8900 

2 8900 

3 3124 

3 6864 

16.0912 


2.4336 

1 8496 

2.8224 

3.9204 

3,4596 , 

14 4856 

iir 

3 1684 

2.3716 

4.0804 

3.3124 

2.6896 

15 6224 

IV 

3 0276 

3.6864 ! 

3.6864 1 

2.6896 

3 0625 

16 1525 

Total 

11 '9420 

10,7976 i 

13.4792 ! 

13-^148 

“ 12.8981”! 

Ji2.3517 


= d, Nr - 5, « 20. 

(SX)* * (35.17)* = 1,236.9289. 

2 + (7.32)* + (7.26)» + (7.17)* = 247.8193. 

of class! fi (nation: (1) the different pencils, of which there were five, and 
(2) the location on the pencil where the tes[ w^as made, of which there 
were four for each pencil. Each pencil was sharpened and tested, then 
sharpened again and tested, and so on. It is conceivable that changes 
in location may be associated with a progressive increase or decrease of 
strength of the lead. 

Table 26.3 has 5 X 4 = 20 boxes® or cells of observed data, in each of 

• The term ‘‘box” is used in this text, since we have already used to indicate 
the mean of a column and shall later use Jih to indicate the mean of a box. 
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which there is but a single entry. We shall see later that it is desirable 
to have more than one entry in a box, if that is possible. However, there 
are some situations, such as the present one, in which only one entry is 
possible. We could include more pencils or we could test each pencil 
at more locations, but we could not have more than one test at a given 
location on a pencil. 

For the data of Table 26.3, we have variation between column means 
and total variation, as before. However, there is no variation within 
columns, but instead, there is variation between row means and a residual 
variation representing a difference between (1) total variation and (2) 
variation between column means plus variation between row means. 
We shall first compute each of these variations. 

Total variation. The expression is the same as that previously used, 
and for the data of 20.3, vve have 


2^2 


" N 


62.3517 - 


1,236.9289 

”20 ' ’ ' 


0.505255. 


Variation between column 7mans may also be obtained by use of the 
expression used before, but, as pointed out in footnote 6, it may be 
slightly simplified when the number of items in the columns is the same. 
For the pencil data, 

247.8193 1,236.9289 

4' '' 20 ' 

0.108380. 



Variation between row means. This roncefit is the exact parallel of that 
just given. Using the following symbols, 

Xrj the mean of a low, 

;Vr, the number of items in a row, 
krt the number of rows, 

Nr 

2, a sum over the Nr items in a row, and 
1 

kr 

2, a sum over the h rows, 

1 


and remembering that the number of items in the rows is the same, avc 
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kr /N. Y 

?\?^/ _ (SX)* _ 309.4161 1,236.9289 

nI X ~ 5 ' --20’ ■' 

= 0.036775. 

Residual variation, The sum of the variation between column means 
and the variation between row means is less than total variation. This 
difference, which is 

( 0 .r 3 () 5255 ) - ( 0 . 1 C 8380 + 0 . 036775 ) - 0 . 300100 , 

is ordinarily referred to as '^residual variation,’^ since it is usually com- 
puted as a residual. It is possible to compute this value directly by 
means of the expression 

S(X + 1 -- Xr - 


For *^hr> data of liable 20,3, this time-consuming computation gives 
0.360100, the same value as was obtained as a residual. 

Estimated variances. I'ablo 20. 1 summarizes the foregoing results and 
shows also the number of degrees of freedom and the estimated variances. 

TABLE 26.4 


Summary of ('omputaf ions for Awialysis of Variance of Data of 
Strength of Lead in Pencils 


Sonrro of variation 

Bt‘t\N(Tn roliiniM iiK aiis 

Rolweon row nu*ans 

Amount 

of 

variation 

" 0 losSo 

0 030775 I 
0 300100 

1 Degrees 

1 of 

1 freedom 

ICstimated 

variance 

' 3 

i 12 

0 027095 
0 012258 
0 030(X)8 

RoHidiial 

Total 

0.50525: 

1 ' 19 ~ 



Since there are five column means, the variation of which was computed 
around variation between column means has four degrees of freedom. 
Variation between ro^v means involved four means, the variation of which 
was in relation to X, so variation between row means has three degrees 
of freedom. Since total variation has iV — 1 - 20 — 1 - 19 degrees 
of freedom, residual variation has 19 — (4 + 3) - 12 degrees of freedom. 

From the estimated variances of Table 26.4, we may now make two F 
ttMsts, one for column means: 


0 027095 
0. 03001)8 


0.903; 4.1} t - 12, 
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and the other for row means: 

0.012258 


F - 


0.030008 


0.408; rii = 3, na = 12. 


Since neither of these F values exceeds l.O, it is clear that neither the 
estimated variance between column means (that is, between pencils) nor 
the estimated variance between row means (that is, between locations) 
exceeds our estimate of chance variance. Therefore, no significance test 
is needed.^® If the reader is interested in knowing whether either F value 
is significantly le.‘vs than l.O, he may proceed as indicated earlier; compute 

" and look up this value in Appendix M with the degrees of freedom 
r 

reversed. He will find that neither of the F values is significantly les.s 
than 1.0. 

The denominator for both of the F values computed above was esti- 
mated residual variance; that -was our measure of chance variance, since 
it was the only one of the four sources of variation which would not be 
affected by heterogeneity. The fact that there was but one entry in a 
box in Table 26.3 makes it impossible to evaluate two elcmenfs which are 
present and separable when there is more than one entry In a box. These 
are: (1) interaction between the two criteria of classification and (2) 
variation within boxes. 

Two criteria of clas.sification, more than one entry in. a box. 
Part I of Table 26.5 shows data of life in minutes of nine brands of flash- 
light cells when in new condition and after 6 -12 months^ storage. Here 
there are two criteria of classification, as before, but there are five entries 
in each box. Total variation is now made up of four components: 
variation between column means, variation between row means, inter- 
action between column and row means, and variation within boxes. 
Using the sums shown in Table 26.5, we shall proceed to obtain the 
numerical values of all of these. 

Total variation. The expression for total variation is the same as 
previously used. 


2X2 . 


(2X)2 


= 34,325,736 - 


2,874,160,996 
90 ’ 


= 34,325,736 31,938,455.51, 

= 2,387,280.49. 


If we ignore the locations on the pencils where the tests were made, the data of 
Table 26.3 form a problem with one criterion of classification. On this basis, also, 
variance between column means (that is, between pencils) is not significant. See the 
first edition of this text, pp. 356-359. 
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Variation Jjciween column means employs the same formula as in the 
preceding illustration, since the number of items in the two columns of 
Part I of Table 26.5 is the same. 


f ii _ 2,874,460,^6^ 

' "" ]V 45 ■ ■ 90 ’ 

= 32,311,460.30 - 31,938,455.51, 

= 373,004.85. 

Variation between row means also uses the same expression as in the 
preceding example, since the numi)er of ilems in the nine rows of Part I 
of Table 26.5 is the same. 


kr /Nr \ 2 

nf-v 




(S,V)_= 

■jv' 


333,359,050 

id 


2,874,460,996 

‘90 


- 33,335.905 - 31,938,455.51, 
■-= 1,397,419,49. 


Variation within boxes. I’his is the variation of the items in the boxes 
around the means of the boxes. Symbolically it is 

t. r AT. 1 

s 2:(Y ~ , 

1 L 1 J 

where 

Xb is the mean of a box, 

Nb is the number of items in a box, 
kb is the number of boxes, 

Nf, 

X is a sum over the Nb itemis in a box, and 
1 

kb 

2 is a sum over the kb boxes. 

1 

By a process similar to that shown in Appendix S, section 26.2, this 
expression becomes 

kb 

SX* - 2 

1 

However, there is the same number of items in each of the boxes of Table 
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f TABLE 26.5 

• j f i«ntvai* of yariance of Data of Li/« 

11 . Squares and sums for 

1, Obsorvod data and sums tor columns and rows 

columns and rows 


i 



After j 

Sr 

Brand 

New 

stor- 1 

V v 



age 




— — 

“ 



691)' 

612 



728 

5id 


A 

7:J0 

558- 

6,214 


G83 

479; 

1 


720 

^35| 



66i 

64d 



! 04G 

642 


H 

OOd 

6116 

6,507 


1 674 

678 



I m 

, 646 



Brand j New 


After 

storage 



7) 




G 


484,416 
529,984 
532,9001 
466,489 
518,400| 
436,92 1 
417,316 
480,2491 
454,276 
459^^684 

501,061 

573,049 

692.224 
619,369 
577^6001 
765,66(i| 
538,756 
714,025| 
636,804 

783.225 
4767100 
537,2891 
541,696 
477,481 
431,281 
537,28',)' 
573,049^ 
,509,796 
369,664 
48^0,249 

■228,481 
538,7561 

403,225 
451,584 
168, 100 
“ "226,960 

343,396, 
156,025 
171,3961 
191.844 
462,400 
257,049 
131,044 
209,764 
1 308.025 1 

20,361,174 


Nr 

S.VI 

J 


374,544 

263,169 

311,364' 

229,441 

2^5.025 

■ 413 , 440 ' 

412.164, 
404,4'J6i 
1.59,684 
41J,3_16 
■ 52l"284l 
448, 9(H) 
421,201 
515,524 
JMOJOl 
' 498,4361 
431.649 


3,955,732 


4,355,555 


5, 130,856 


4,440,023 


529,984| 5,726,7-1 
331,776; 

^56,516 
3947384 
419,904 
362,404 
386,884 
409,600 
451.5841 
364,8161 
386,884 j 
331,776 
432,961 
""■87,6161, 

207,025! 

102,41M) 

73,984 
230,4(8) 

"■17o75C9 


4.438,071 


2,491,574 


19,044 

1,444 

54,7.56 

166,461 

295,936 

51.529 

156,816 

1,621.223 

2,162,931 

13,964,562 

:p4, 325,736 = 
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TABLK 26.5 

III. Stiin.s and squares of .sums for boxes 


Box 

N, 

2X 

1 

(?■)' 

Row 1, CV)l. 1 

.'5 ,.5r>7 

1 12.6^52,249 

( ol. 2 

2,657 

: 7,0,50.649 

Row 2, Col. 1 

3 ,352 

1 1 ,235 ,904 

(\>l. 2 

3,245 

10,530,02.5 

Row ( ol. i 

3.885 

15,003,225 

Col. 2 

3.207 

10.284,849 

Row 1. Col. 1 

4 , 102 

16.826.404 

CV)1. 2 

3,113 

1 1 .6l8,5(io 

Row 5, Col. 1 

3 ,500 

12 313,081 

Col. 2 

3,140 

0 ,850,6(K) 

Rotv (), ( 1 

3,505 

12,285,025 

( ol. 2 

3,132 

9.800. 124 

lU)w 7. C\)l. 1 

2,020 

8.579,011 

C^)L2 

1,823 

3,323,320 

Row 8. ( 'ol I 

2.303 

5.303,800 

('ol. 2 

1 .366 

1 ,865 .1)56 

Rosn 9, < oi 1 

j 2 5t;2 

6.563,841 

(^>1. 2 

! 1,027 

: 3. 7 13, '320 


Y 1 ■■ U /K: \ 

Tot.al , 5;i.01t '■ KiH \)»7 ,:U2 - 2; ( I'.V ) 

^ Life (if a I ell iH the tm’e in nun n teg for «m ll ' «)lta>ie to dniii to 0 00 \ olts 
ul.en tested jts in I'Ydeial Spf c itieation W-B-lOlh, T\ pe T’> f elN an- ilie 
htfj/fiqt fLislihj.'‘ht m;?*; ^ 

Data m pait I fuinislu**! tlnoOKL the < ourtegv of ( 'on'i'ino is’ UeM'Hreli 
\Va‘*hi niflou Niiw .[( rsev, from its tests of flaHhhy:ht hatti'iie^ r-'poiiecl in 
CKYs A'ooi.sl llL'id I^iilletm 





r»- 2 , 871 , lti(). 9 ‘Mi 

( 2 ',), 701 ’* -i i 2 A, 0 Mn 2 = I.Tid (Mo 710 . 


(t;,2l4'- -i iCi, 507;' -j- <7,002 * f i7.:.lr)i = 
f (il.tUOi^ -j- 4 (d,752'' d 

4 id.dsoi- - ;i;w,o 5 o,o:>o. 


20 ..'), Part 1 ; .''O w<“ can write 



I08.(»I7,312 

34 , 32 r), 7 :Ui : — 

34,325,730 - 33,780.402 4, 
.530,273.0. 
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Interaction, The numerical value for total variation exceeds the sum 
of the three variations last obtained. This difference is the variation due 
to interaction between column means and row means. Jts numerical 
value is 

2,387,280.49 - (373,004.85 + 1,397.149.49 -j- 536,273.6) - 80,552.55. 

Alternatively, but much more laboriously, iiitera(d.ion iiiay be computed 
directly from 

-f X - Xr - X,n 

I 


Estimated variaficrs. Table 26.6 .shows the amount of variation, the 
degrees of freedom, and the estimated varim^ce for each sour<'c of varia- 
tion; total variation and the degrec^s of freedom for total variation an^ also 

TABLK 26.6 

Summary of Coot putations for Analysis of yariance of Data of Life 
of Type D Flashlight Cells 


Source of variation 

Betwf'en cohnnn nu‘ans 
Betwoon row moans . . . . 
Interaction , . . 

Within bo.\eb 

Total . . 


Amount ' l)oj>n*('‘< 

of of 

vaiiatioii i fn.odom i 

" 373 .’od-l 85 1 " 1 ! 

1 .3'.t7,41<J.-iy I 8 ! 

80,552 55 : 8 j 

530,273 0 i 72 ! 

2,387,280 -10 i ’8!) i 


iCsl ir))atr(l 
viirianci* 

,00 1 '83 
174,081.10 
1(),0C)<) 07 
7, MS 21 


shown. The number of degrees of freedom for variatioji within boxes is 
^'{>(^6 — 1) “ 72, since the deviation of each item in a box was taken from 
the mean of the box. Degree's of freedom for interactioji are obtained by 
subtracting the degrees of fnaalom for the other thre(‘ sources of variation 
from the degrees of freedom for total variation, ddiiis, the inimber of 
de^grees of freedom for interaction is 

89 - (1 + 8 + 72) = 8. 

We are now ready to test the estimated variance between column 
moans and the estimated variance between row means. However, we 
must first decide which of the other t^vo variances is to be the denominator 
of the F test. It is true that the variation within boxes is the only one 
of the four sources of variation w^hich' would be unaffected by hetero- 
geneity among column, row, or box means. It would therefore appear 
that estimated variance within boxes should be our measure of chance. 
But there is another point to consider; if the difference between row (or 
column) means is not greater than the interaction between row and 
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column means, the difference can hardly be •considered meaningful.^* 
Consequently, the usual procedure is as follows: first tost the estimated 
variance of interaction against the estimated variance within boxes; if 
the estimated variaiH’o of interaction is significantly larger than the 
estimated variance within boxes, test each of the. other two estimated 
variances against the estimated variaru'e of interaction; if the estimated 
variance of iiitera(;tion is smaller than, or is not significantly larger than, 
the estimated variance within boxes, pool the variation and the degrees 
of freedom from these two sources and compute a new estimated variance 
to be used as the denominator for the F test.*^ 

'resting first the (‘sthnated variance of interaction against estimated 
variance within boxes, w(‘ have 


F - 


1(),0{)1).()7 

7,118.21 


i.sr, 


(a I 8; no 


72.) 


From Appendix M it, is seen that this value of F is not significantly 
greater than 1.0, so estimated variance of interaction does not signifi- 
cantly the estimated variance within },»oxcs. 

Since interaction is not signific'ant, wo pool the variation of interaction 
and witliin hox(‘> and divide this value by the riegn'es of freedom for 
these two bourccs of variation, giving 


610,820.15 80 - 7, 7 10.33. 


'J’his is the denominator of F for t(‘sting estimated variance between 
column means and estimated variance between row means. 

For column means, 


373,001.85 

7,710.33 


48.38. (ni 


1 j ^2 ~ 80.) 


This point is not so ra.sy to grasp from tho data of Tahlo 2().5 ns it is from an 
illirsl ration given Ijy Mood. Itis example, for whirli no data are given, deals \n ith 
five men (eohirnns) operating four maehines (rows) and ha.s three ohsorvHtion.s in each 
box. lie notes that one man may do better on one inaehlne than another man, but 
the first man may not do as much better or may even do wor.se on a second niacliiue. 
To be meaningful, the difTcrence.s between inachine.s should exceed the interaction; 
otherwise, one might in.stall what appeared to be the best machine but find that the 
man a.ssigned to operate that inai^hine is not .as productive on it as he would have 
been on another machine. Sec A. M. Mood, ] ritroduciwn to the Theory of Stuiisij(s, 
Mc(haw-Hill Book Company, New^ >'ork, 1050, pp. 

Some authorities recommend Ub,.»g the larger of the two variances attribulnble 
to interaction or within boxes. If cstiinaiod variances of interaction is the larger, 
but not significantly so, this procedure allows for po.ssible small clTcct.s of interaction 
not revealed when estimated variance of interaction wa^ tested. It also tends to 
increase the number of Type II errors. 
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From Appendix M it is seen that this value of F is far beyond the 0.001 
point, so the difference between column means (between fresh and stored 
cells) is real. 

For row means, 


174,681.19 

7,710.33 


= 22.66. (ni - 8; = SO.) 


This F value, too, is beyond the 0.001 point, and the diri’erenec between 
row means (ijctwecn brands of cells) is significant. 

Situations in which there are two criteria of classilieation with imcfiual 
numbers of items in the boxes, and those involving three or mure criteria 
of classification, are beyond the scope of this book.^^ 


Inlerrelalioiiship.s Jielwceii ■» /, nod F 

0 

In C4iapter 24 it was noted that the / (hstribntion approach(‘s tfie nor- 
mal distribution as ti approaclies infinity 'The normal distrilnition 
therefore a special case of the t distribution, as shown in tlu* last row of 
Appendix I. 

In C'hapter 2o it was pointed out that, for the same set of data, normal 
deviates yield the same probabilities as do x' valu(\s wIkmi // 1 for x'- 

More fe])ecilically. we found, upon comparing Ap])(Mulicc‘s Jl and .1, that 
/:r\^ 

for a given probability ^ 1 — X^ when )/ I for x*^- 

X * 

In this chapter it was nok^il that, for anv giviui probabilil v, - --- F, 

n 

when fi for X“ ecluals for F and when ?i 2 — for F. 4'his may be seen 
by comparing Appendices J and M. 

In this chapter, also, it was point (*d out. that for any given probability, 

~ F when n for t eijuals n: for F and when n, for F is I, This is aj^jiar- 
ent from an examination of Appendices 1 and M. 

What lias been said in thf* prci'cding four paragraphs has been firought 
t()g(*ther in (’hart 26 2. From this chart it is ch-u’ that F is an inclusi\c 
di.stribution in that the oUicr three drstributions are merely' special cases 
of F. 

MEASURES OF SKEWNESS AM) K( KTOSIS 

Skewness. In Chapter 10 the skewuiess of the distribution of the 
grades of 225 midshipmen, as mea.sured by (ii, was found to be 0,18. 

See Tl. M. Walker and J. Lev, Stniisttral fnfermer, Henry Holt and (’omp^^nv, 
New York, LOS.'k PP- 38(), 
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Chari 26.2. Kt’lalioiiMhip Between the Normal, t, and F Distrihu lions. 

Each )>ox within the double rule& ma.v be thought of as the end of a drawer which, 
when pulled out, reveals the F values and, in somi mstances, the squared normal 

v2 

{N^), V^y and -- values for the indicated probabilities. The entire diagram isF, Th<* 

X* 

box at the extreme lower left is The left column is F\ The ])ottom row is — 

n 

This chart is an elaboration of one given in K. Matlior, Statistical Analysis in Biology, 
p. 17, Intersoience Publishers, New York, ll>43. 

Using 0.05 as a criterion, is this value of I3i significantly greater than 0? 
Egon S. Pearson has prepared tables of the 0.10 and 0.02 limits of /3i 
when based on samples drawn from a normal population. This table is 
shown as Appendix 0, and th' small (*hart inchided with that appendix 
shows the shape of the distribution of /3i. Appendix 0 does not show the 
values of /3i for N = 225, but for either N = 200 or N = 250 the value 
0.18 is beyond the 0.02 point. Significant skewness is present. 

In Chapter 10 the value of /3i for the distribution of ages at death of 371 
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American inventors wasTound to be 0.16. From Appendix 0 this value, 
also, is seen to be significantly greater than zer6. 

In Chapter 23 a normal curve was fitted to the distribution of baseball 
throws for distance by 303 first-year high school girls. jSi was found to 
be 0.0104. The value for jSi does not differ significantly from 0, as may 
be seen from Appendix O. 

Kurtosis. Table 10.9 showed a leptokurtic distribution, the cost of 
building five-room wood houses, with = 4.46 and N = 82. With 0.05 
as our criterion, is this value of 4.46 significantly different from 3.0, the 
value of ^2 for a normal distribution? Appendix P shows the upper and 
lower 0.01 and 0,05 limits of ^2 when based on random samples from a 
normal distribution. Since Appendix P shows no entries for values of 
N below 100, we cannot be sure whether or not ^2 = 4.46 is beyond the 
upper 0.01 point, but it is probably beyond 0.05. 

In Table 10.10 a di.stribution of the length of life of a group of electric 
lamps was found to have ^2 = 2.22. We cannot make a test to determine 
whether 2.22 is significantly less than 3.0, since the data of Table 10.10 
were in terms of percentage frocpieneies and we do not know the number 
of lamps involved. However, if we look at Appendix P, we may note 
that 02 = 2.18 is at the lower 0.01 limit and 02 — 2.35 is at the lower 0.05 
limit when the sample consists of but 100 items. For samples of 125 
items or more, 02 = 2.22 is beyond the 0.01 point. If the data of Talde 
10.10 include 100 or more lamps Tand they should, or percentages should 
not have been shown), the distribution is significantly platykurtic. 


CORRELATION COEFFICIENTS 


Simple correlation. When a correlation analysis has been made for 
a sample, a number of cjucstions may be raised. Among them are: Does 
the value of r differ significantly from zero? Does the value of r differ 
significantly from a specified value other than zero? Do two r values 
differ significantly from each other? What are the confidence limits 
of the correlation in the population? What single estimate of the cor- 
relation in the population may be made? We shall consider each of these 
in turn. 

Does the value of r differ significantly from zero? Here we test the 
hypothesis that there i.s no correlation in the population. That is, that 
r% or 0. If the hypothesis ivS discredited, tiie correlation is con- 
sidered significant. The procedure involves the ^test with which the 
reader is already familiar. The value of t is obtained from 


^ \ 1 L 72 “ V T- 7*2 
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after which we ascertain P from Appendix I with n = N — 2. (Two 
degrees of freedom are lost because of the two constaiit/s in the estimating 
equation. For the data of height growth and diameter growth of 
trees, N was 20 and r was +0.758. These give 


t = 0,758 



4.93. 


When n “ 20 — 2 = 18, Appendix I shows that t = 4.93 has P < 0.001. 
Consequently, the value of r is significant. 

It is of interest that this test is the same as the test to ascertain whether 
h differs significantly from zero. The expression to use is^^ 






For the tree data, we found b — d-l.f)77, Sx- = 42.6056, and = 
88.7 i . Consequently, 


. (42.0055(20 - 2) , „„ 

' = ' " s-74 

the same as obtained before. 

Doc\s the valne of r differ significantly from a specified value other than 
zero? When = 0, the distribution of values of r from random samples 
is symmetrical about 0, ranging from —1.0 to +1.0. W'hen r(p 5 *^ 0, the 
distribution of values of r from random samples is not symmetrical around 
r(p, and the ^tcst is inappropriate. To test wln-iher r differs significant!}'' 

** A more com})l(*te atalement is thi.s: We know that =* F when tji for F is 1 and 
when n for t equals iii for F. The F te,st correspond mg to the above t test is 

p = - (2 - 1 ) 

4- (.V - 2) 

Explained variation has 2 — 1 — 1 degree of freedom, since it is based upon the 
deviations of the Yc values {Yc =» a + bX) from F. Unexplained variation has 
V — 2 degrees of freedom, since it is based upon the deviations of the iV values from 
Fc = a 4- bX. 

For proof of the equality, see Appendix S. sectioii 26 3. A number of alternative 
formulas for testing r or h are available. Among these are: 

ty\ \ 2Ix*2y* - (Sx 2 /)** 
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from a value of 9 ^ 0, we transform r into^® 

1 + r 

2 = 1.15129 log -- -r 
1 — r 


the distribution of which is approximately normal around 


Zff = 1. 15129 log - 

■ 1 - 


with the standard error of 2 being^" 


1 _ 

Vn - 2.GG67 


Suppose that wo wish to know whether our /• of +0.758 for the tree- 
growth data differs significantly from a hypothetical of +0.750. We 
compute 

z = 1.15129 log ^ ^ ~ 0.992; 

^ 1 - 0.758 


= 


1,15129 log 


1 + 0.750 
I 0.750 


0.973; 


O', = — ^ ...r - ~ = 0,240; and 

V 20 - 2.GGG7 

X _ 2 - 2 ^, 0SJd2 - 0.973 _ a019 
(T ” (T, ■" 0.240 "" 07240 


Appendix H tells us that we may expect a dilTt^rence this large or larger 
owing to chance causes about 94 times in 100. The hypothesis that 
r = +0.758 is the correlation of a random sample from a population 
having r(p = +0.750 is not impugned. The difTcrence is not significant. 

Do two values of r dijfer significantly from each other? If we were inter- 
ested in testing the significance of the difference between the value of 
r == +0.758 (zi = 0.992) for our sample and that of another sample r of 

** See R. A. Fisher, Statistical Methods for Research Workers^ Hafner Publishing Co., 
New York, 1950, 11th cd., pp. 197-204. 

The usual expression is =« — 1 — • For explanation of that given here, see 

Vn ~ 3 

^‘New Light on the Correlation Coefficient and its Transforms,” by Harold Hotelling, 
Journal of the Royal Stalistiral Society ^ Series B, Vol. XV^, No. 2, 1953, p. 220. On 
pages 223-224, Hotelling suggests two modifications of z which may be more nearly 
normal than the form given above 
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-F0.750 (zj •= 0.973), obtained from 20 pairs of item.s, we would compute 


(T., = = 0.240; 

V20 - 2.6667 

<r., = - 7- - — =0.2^0; 

V20 - 2.6667 

- Vet + < - V[().240)^ 4- (0.240) ^ 

= 0.339 ; and 

T 2i - 02 _ 0.992 - 0.973 0.019 _ ^ 

<T ~ 0.339 "" 0.339 

3^he table of normal areas (Appendix H) gives F = 0.95, and we conclude 
that the diflV^renee is not significant. 

Confidence limitH of r^p. As in the case of X^y, ir, and <7, we may wish to 
know the confidence limits of r<p. 'fhese ar obtained by use of the 
expression 

X 

z — Z(S^ X ■“ 


This will give us two values for 0 ,v, which are tlien converted to r(p values. 




If we wish the 95 per (‘cnt (HUilideiicc limits 

growth data, where r was 4-0 758 and z = 0.992, we have 
0.992 - ± (1.960) (0.240). 


for the tree- 


2, P - 0.992 ± 0. 1704. 
2 tP, = 0 5216 and 
= 1.4624. 


C on verting 2 (P, to r^p, and 2 (P, to rcpj gives 

r(p, = +0.479 and 
r(p, ~ +0.898, 


which arc the 95 per cent confidence limits. 

Single estimate of r^v*. When discussing variances, we noted that a 
single estimate of <7- might be made from a sample by means of 


- 


- 1 * 


In somewhat similar fashion, an estimate may be made of We shall 
refer to it as r^. We use rather than the more logical f J, to indicate an 
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estimate of the coefficient of determination in the population, in order to 
avoid complicated subscripts in later sections of this chapter. 

We already know, from footnote 8 in Chapter 19, that 






Now, Sy X is a biased estimate of <Ty.x-, s-nd sj- is a biased estimate of <ry. 
Unbiased estimates are obtained by dividing the measures of variation 
by the appropriate number of degrees of freedom, rather than by N. 
Thus, 


d 


: 

r 


N - V 






N 


; and 


*= 1 — 




Sy!_-f- {N 
Sy‘ {N 


2) 

i)’ 


Since 


we may write 


Sy; N- 1 

2y2 N - 2 


M 

Sy* 


1 -r^ 




1 - (1 - r^) 


N 

iV" 


1 

2 


For the tree-growth data, where r* = 0.574 and r - -f 0.758: 

20-1 

= 1 - (1 - 0.574) 

= 0,550. 
f = +0.742. 

When is very low, P may be negative. In such a case, the correlation 
in the population should be considered to be zero. 

Non-linear correlation. When dealing with a second-degree curve, 
a third-degree curve, or a curve of higher order, we may wish to know: 
(1) whether the non-linear coefficient of determination is significantly 
larger than a coefficient based upon a curve of lower order, or (2) whether 
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the non-linear coefficient is significantly greater than zero. We may also 
occasionally wish to make an estimate of the correlation in the population. 

Second-degree curve. For the data of diameter and volume of ponderosa 
pine trees, we found, in Chapter 20, that 


and 


Variation explained by straight line 
Total variation 


Si/^ _ 152,259.2 
27/2 ~ 159^698 


- 0.953, 




V’^ariation explained by second-degree curve 


I'otal variation 




156,235.5 




159,698 


- 0 .^"’:^ 8 . 


The simplest method of ascertaining whether r* is significantly larger 
than 7*2 is to compute the measure r\xKx mentioned in footnote 2 of 
Chapter 20, and make a Most of with n = iV — 3. (ICxplanation 
of the use of N — 3 is given on the next page.) Tliis coefficient of partial 
determination, r\xKx^ which tells us the proportion that (1) the added 
variation explained by the use of constitutes of (2) the variation 
unexplained by the straight line, is 




rl.xx^ r_ 

I — 

0.978 - 0.953 
1 - 0.953 


0.532. 


The t tost is exactly the same as the t test for r, except that we use 
.V — 3 instead of N — 2. 



/6'532(20 - 3) 


4.1 


When w =» 17, a value of t — 4.4 is beyond the 0.001 level (see Appen- 
dix I), so we conclude that the use of X® has explained a significantly 
larger amount of variation. 
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The foregoing is a simpler equivalent of the usual F test'® in which 


F - 


second-degree eurv 

( \"ariation explained by 
secc 


/ Total \ 
\variation/ 


(^l/cY.XXl 




[‘ond-degree curve 
1 


ariation explained^ ] 
ght li 

yp 


( Variation explained by\ ^ /Variatic 

second-degree curve / \ by straight line ) J ' freedom 


Degrees of 


Degrees of freedom 


,0 - (^v - 8) 


with Ni — 1 and .V? = N — 3. The number of degrees of freedom in tlu^ 
numerator is 2—1 — 1, because it is the difference between the number 
of degrees of freedom for explained variation computed from the second- 
degree curve (Avhich is two) and the number of degre(\s of freedom for 
explained variation computed from the straight line (which is one). 
Explained variation obtained fn>m the second-degree curve has 3 - 1 —2 
degrees of freedom because the equation has three constants and the vari- 
ation of the computed values was taken around explained variation 
gotten from the straight line has 2—1 = 1 degree of freedom because 
the equation has two constants and the variation of the (‘ornputed values 
was taken around Y. The number of degrees of freedom for “ 

- '^ylr.xx’, in the denominator, is N' — 3 becaus(^ {he unexj)lained 
variation was obtained from the squared differences C)f tlie Y v^alues (of 
which there are N) from a second-degree curve, which has three constants. 
Alternatively, we may note that total variation has N — 1 degrees of 
freedom and that explained variation has 3 — I degrees of freedom; 
therefore, tljeir difference, which is unexplained variation, has (.V — D — 
(3 — 1) - N — 3 degrees of freedom. 

If the numerator and denominator of the exp)ression given above for 
F are each divided by 2?/', we have the alternative form 

„ ^ 7/"^ _r ^ 

(1 ~ •' 

with rii = 1 and ??2 = -V — 3. 

To ascertain whether vx' = 0.978 is signifiranlly greater than 0, we 
use the E-test, computing either’^ 

, __ - (3-1) 

" (1 - 4.,,,) {:V - 3) 

The equivalence of the I test and the F test for this and other coefficients of partial 
determination is shown in Appendix S, section 2G.4. 

'• If both numerator and denominator of the second expression are divided by 2j/*, 
the first expression is obtained. 
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or 

ri _ ”iA? 

(2i/2 -- 3)' 

with rii = 3 - 1 ami 712 ^ A' - 3. Wo use (3 — 1) degrees of freedom 
ill the numerator because the second-degree curve has three constants 
and explained variation computed from that curve ivas taken around 1^; 
more generally, th(‘ degrees of freedom for explained v^ariation are 
(m — 1), where m is the number of constants in the estimating equation, 
'rhe number of degrees of freedom in the denominator was explained in 
the preceding paragraph; in general, the number of degrees of freedom for 
unexplained variation is (;V - ?n). 

Using the first oxprf'ssion for the data of pouderosa pine trees, \vc get 

0.07S --- (3-1) 

' “ ( 1 - 0/J78) (20 - 3)' 

370.1 (only two digits are significant), 

with 7i\ ~ 2 and 712 — 17. Referring to the F table of >\ppendix M, it is 
clear that this F value significautly exceeds 1.0, since it has a probability 
of much less than 0.001, and that, therefore, significantly exceeds 

zero. 

'fhc procedure for making an estunatc of tlie correlation in the popula- 
tion is similar to that previously given for linear eorrelation. That is 

- (A^ - 3) 

V .^2 - 1 ) 

- I 

^ 1 - (L - 0.078)1^, 

- 0.975, 


_ 1 

r Y VA» “ A 


- 1 


Third-degree curve. 
the type 


To ascertain whether the use of in a curve of 
- a + hX -h ,ixB 


explains a significant additional amount of variation, compute 

2 2 

2 _ '^ Y.XX^ 

^J’X®.XX» “ 1 2 

^ ~ ^Y.XX* 


and then make a t test using 


t = 



-2 
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with n « JV ~ 4. The equivalent F test is 

P (^V cY .XX^X* y 1 

iZy^' ^ 2y.VxxJ3)”- (/V -ly 

( ^ y.xx* jr» ^r.xx‘l 1 

(I "" ^y.xx*x«) (A’^ — 4) 

with ni = 1 and nj = iV — 4. 

To test the hypothesis that the population correlation is zero, compute 

- a - i) 

„ ^ ^ylr.xx^x> (4 - 1) 

2j/.V,„. JT (.V -4)’ 

with «! == 4 — 1 and ?i 2 = N - 4. Remember that ^yly,xx\x* - 

X' <> V* 3 

The estimate of the correlation in the population is 

^ - (.V - 4) 

ry.xx.x. - 1 -- ^ (N - 'l) 

U ^r.xx^x*) Y _ ^ 

The reader can readily adapt t'nese expressions foi* curves of a higher 
order. That, however, should rarely he necessary, since third-degree 
curves are not often used and curves of higher order are even more infre- 
quently employed. 

The correlation ratio. For the data of yield per acre <»f broom corn and 
man hours per ton, we found in Chapter 20 that 

j Variation explained by column means 
Total variatiofi of the scries 


148. 115 

217.515 


- 0.081. 


If a second-degree curve is fitted to the same data, wo get^® 


^VcY.XX^ 


140743 

2i7.515 


= 0.647. 


*° For the correlation analysis of th^se data using a second-degree curve, see the 
first edition of this text, pp. 721-727. 
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To ascertain whether ^ is significantly larger than rl ^xh con»pute 


p _ Degrees of freedom 

(I Degrees of freedom 

_ (0.681 - 0.647) ^ (11 - 2) ^ 0.00378 
0 - 0.68 1) -^(103 - 12) "" 0.00351 

with ni — 9 and no — 91. Or, we may use 


K 

Variation explained^ 
by column mc^ans y/ 

1- 

/Variation explained by\ 
\ second-degree curve / 

Degrees of 
freedom 


/Total variation\ / 
\ of the y series / \ 

^^ariation explainedX 
^ by cohimii means / 


Degrees of 
freedom 


048.115^-- 140.743) (111 -Jl 

(217.515 - 148.115) ~ (103 ~ 12) 
_0,'8i9l 

0.7-626 ^ 


with ni = 9 and rh — 91. The degrees of freedom in the numerator 
represent the diff<'rence between the degrees of freedom for explained 
variation using the column means (which is 11) and the degrees of freedom 
for explained variation using the second-degree curve (which is 2), The 
number of degrees of freedom for explained variation using the column 
means is 12 — 1 == 11 because there were 12 column means and the 
variation of those means was coiuputed in re^j;tion to F, The number 
of d(^grees of freedom for explained variation using the second-degree 
curve is 3 — 1 = 2 because the equation has three constants and the 
variation of the computed values was taken around Y, The degrees of 
freedom in the denominator, for the variation unexplained by the column 
means, are A' minus the number of column means, that is, 103 — 12 = 91. 

Referring to Appendix M to ascertain the probability of F = 1.1 when 
m == 9 and na = 91, we find that neither 7ii = 9 nor na = 91 is shown in 
the table. However, it is not necessary to interpolate. By looking at 
the F values when ni = 8 and 12 and ria == 60 and 120, it is clear that the 
probability is greater than 0,10 and that rjr.z is not significantly larger 
than rr xx>- 

To determine whether 17 ^ ^ is significantly greater than zero, we use 
expressions for F similar to those previously employed for the same pur- 
pose for non-linear coefficients. They are 
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F - 


Vr.x 


(Degreps of 'ireedom = Number of cohimn means — 1) 


0.G81 


(1 Vr.x) ■ _ 

(12 - 1 ) 


Degrees of freedom = 
Number of column means 


) 


(1 - 0.681) ^ (103 - 12) 
0.0619 

0.00351 “ ■ ’ 


F = 


/Variation explainedX , /Degrees of freedom = Number\ 
\ means / \ of cohimii means — 1 / 


/Total 
I variation of 
\the V series; 


( Variation 
explained by 
column means; 


( Degrees of freedom = 
N — Number of 
column means 


) 


3 ( 12_3 1 ) 

(217^515 ^148.U5r-^ (103 - 12) 
13.46 


0.763 


- 17,0, 


For this value of F, ni = 11 and = 91. Neither of these is tabled in 
Appendix M; but, looking up ui ■=• 8 or 12 and n? == GO or 120, it is clear 
that F = 17,7 is far beyond the upper 0.001 point, rjl x significantly 
greater than zero. 

The value of an estimate for the population, is 


- 1 - 


or 


( Total varia-\ /Variation N 

tion of the ) '^ ( explained by 
Y series / \column meansy 


/V — NumberX 


I of column 
\mean.s 


) 


Vr.x = 1 - (1 


(Total variation of the Y scMic^j (V --- 1) 




N — Number of column means 


= 1 - (1 - 0.681)-V¥ = 0,642. 

Multiple correlation. When dealing with multiple correlation 
coefficients, we are primarily interested in knowing whether a given /?- 
(or K) value is significant. W^e shall not use the example of Chapter 21 
as an illustration, because the data used there were not a sample. Instead 
we shall consider a four-variable problem dealing with the physical 
measurements of 27 white boys who were 12, 13, or 14 weeks old.*‘ The 


** These atid other data for boys and girls of various ages were supplied by the 
New York Foundling Hospital, courtesy of Dr. Alfred J. Vignec. Miss Marion C. 
Gentile kindly transcribed the figiires. 
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variables were: 


.Yi, weight in kilograms, 

X 2 , height in centimeters, 

Xz, head circumference in centimeters, and 
^'4, chest circumfcrejice in centimeters. 

We .shall test and Rim, and, to do that, we need the following 
values: 

N = 27. 
lx\ •= 11.(V2.j8. 

= 9.1085; 

Zt:, 2 , = 2.517.3; 

Ri.z = 0.783. 

^ V ;,.244 = 10 . 01 . 5 -:, 

2 x'i, 2,4 = l.filOO; 

= 0.891, 

To ascertain whether a multiple coetlieient of determination signifi- 
cantly exceeds /.ero. we t'mploy an F test, similar to those used for the 
same purpo.se for non-linear coelficienls. In general form, we may use 
either®* 

_ ff] 2?4.- -m (ttt ~ 1) 

(f- RLu-'.-J - ov -- m) 

or, 


F 


F - 


"J-;iM4....n (tn - 1) 


^3", 1234...,,, (.\ r:' 

with Hi — m — \ and N 2 - A' — m. 

Using the first expre.s.sion to test /ij.23 gi\es 

0.783 -t- (^ - 1) 

(1 - 0 >83) (27 - 3)’ 

0.392 


F =: 


0.00904 


= 43.4, 


with ni = 2 and tu = 24, From Appendix M, the value obtained for 
F is seen to be far beyond the upper 0.001 poin?, and R\, 2 z is clearly 
significant. 

** The equivalence of the two expressions is fairly obvious; in the denominator of 
the second expression, write 234 ...,„ in place of SrJi.234 - •• mJ divide 

the numerator and the denominator by ISxJ; the result is the first expression. 
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Again using the first of the two expressions, but this time to test 
f 2 i. 234 > we obtain 

0.8 61 ^ 

“ (1 ^ 0.861) (27 - 4)' 

0.287 

« — = 47.5, 

0.00604 

with ni = 3 and 712 = 23. iZj.ui is also significant. 

Occasionally one may wish the value of estimated coeffi- 

cient of multiple determination in the population. This is 


r»2 _ 1 

-0: 1.734 • • • m A 






-j- 

.^4 " m iV — i 
Sx? ‘n ’~ ni 


1 — (1 — /?5 2 34 «) Tr' 


Computing only ^ 1.234 for the data of the 27 white boys, \v (3 obtain 


Ri . 2 Z 4 — 1 — (I — 7^1.234) 


1 - (1 - 0.861) 


. X - 1 


N --- m 
27 - 1 


== 0.843. 

Partial correlation. Since a coefficient of partial determination tells 
us the proportion that ( 1 ) the additional explained variation attributable 
to a given independent variable is of ( 2 ) the unexplained variation before 
the use of that independent variable, wc are often interested in knowing 
whether the coefficient differs significantly from zero. The test involves 
computing 

. /^lm.23-- • im — m) 


^ 

r ' lm.23 • • • (m-l) 


with n — N — m. 

For the data of the physical measurements of the 27 white boys, 


2 _ ^123 

^14.23 — \ r>2 

1 ^1.23 


2a;,^.234_~ 2 x?,2 
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Using the first expression gives 


r 


2 

14.23 


0.861 - 0.783 
" 1 ^ 0 . 783 *^' 


0.359. 


Variable Xi explained 36 per cent of the variation which X 2 and A3 had 
failed to explain. 

For the value of tj we get 

/oT359(2y~T) 

^ ” 1 - 6T359 

= 3.59, 

with n ~ 23. From the i table of Appendix I, it is seen that 0.001 < 
P < 0.01, and we consider r54 23 to be significant. 

In similar fashion, it may he ascertained whether ri3_24 and ^12.34 are 
significant. Without making the tests here, we shall merely note that 
^^2.34 is significant at the 0.01 level and that 24 is not significant, even 
at th ' 0 05 level, since P for ^4 between 0.30 and 0.40. This does not 
tell us that we should necessarily exclude A3 from our analysis, since X? 
may contribute some useful information even though we have not been 
able to demonstrate its significance. However, if we desired to use but 
two independent variables, they should, of course, be A 2 and A 4. 

As noted on page 728, the t test is an alternative to the F test for testing 
the significance of a partial coefficient of determination. The F test, in 
general terms, is 

p ‘ • (ni-i)) ~ 1)1 

where m — {m — 1) is, of course, always 1. That this expression for F 
and the square of that given above for i are the same is demonstrated in 
Appendix S, section 26.4. 

In rare instances one may wish to know whether a coefficient of partial 
determination differs significantly from a population value which is not 
zero. Such a test may be made in exactly the same fashion as for the 
simple linear correlation coefficient (see pages 722-723), with the standard 
error of z being 

I 1_ 

VW-~2.m&i -^( 7 ,. - 2 ) * rn - o^m’ 

where m is the number of variables involved, which is the same as the 
number of constants in the multiple estimating equation, since we are 
considering only linear multiple correlation. 
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If one wishes the value of ^h«.23 > the estimate for the population, 

it may be obtained from 


^lm.23 ■ • • (m— 1) 


1 


234... m “ rn) 


2.r.,Y234...(m-i) W - (m - 1)]' 
or, if we divide the numerator and denominator each by Sx*, from 

1 ^1.234 «.-m 


— 1 "*■ 


1 ^ 1 . 234 . • . (m-l) 


2 34 ... m ^ 1 . 2 3 4 j- ■ . (m-1 ) _ 

1 .^1.234 • . im -1) 
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APPENDIX D 


Ordinates of the Normal Curve 


Krected at Distances - from Expressed as Decimal Fractions of the 
s 

Maximum Ordinate Yo 

Ni Ni 

The maximum ordinafe is computed from the expression Fo 2^665 

The values tabled below result from solving the expression e 

The proportional height of an ordinate to be erected at any given value on the X axis 
can be read from the table by determining x (the deviation of the given value from the 

mean) and computing Thus, if JP « S25.00, s -■ S4.00, Yo ** 1950; and it is 

desired to ascertain the height of an ordinate to be erected at $23.00; x ** $2.00 and 

X $2.00 

“ *4 ~ 0.50. From the table the ordinate is found to be 0.88250 of the maxi* 

a $4.00 ' 

mum ordinate Yo, or 0.88250 X 1950 1721. 
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Ordinates of the Normal Curve 
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a 

0 

.01 

.02 

03 

.04 

.05 

.06 

07 

M 

.09 

0 0 

1 00000 

9999.5 

99080 

.90955 

99020 

.90875 

.90820 

.99755 

.09685 

.99598 

O.i 

.99501 

.99390 

092S3 

90158 

0)025 

.98831 

.98728 

.93565 

.98393 

982i: 

0.2 

.98020 

.97819 

.97609 

.97390 

97161 

96923 

90070 

.06420 

.90150 

E1I9 

0.3 

.95600 

.95309 

95010 

94702 

.94387 

.94055 

.03723 

93382 

.93024 


0.4 

.92312 

.91939 

.91558 

.91169 

.90774 

.00371 

.89961 

-89543 

.89119 

.88633 


.88250 

.87805 

.87353 

.86890 

EfflSSI 

.85002 

-85483 

.85006 

.84519 

.84025 

06 

.83527 

.83023 

.82.514 

stOO'' 

Hiyi 

.80957 

80129 

.70896 

.70359 

.78817 

0.7 

.78270 

.77721 

.77167 

.76610 

70018 

75184 

.74016 

.74342 

.73709 

.73103 

0.8 

,72615 

.720.33 

.71448 

.raset 

70272 

.6968’ 

69087 

,63493 

.07896 

.67293 

0.9 

.66698 

.00097 

65494 

.64S01 

.6-1237 

.63083 

C3077 

.62472 

.61865 

.61259 

1,0 

.60653 

.60047 

.59440 

-58831 

.58223 

.57623 

.57017 

.56414 

.55810 

.55209 

. 1 

.MG07 

.54007 

.5.3400 

52812 

5221 i 

.51020 

.51027 

50437 

49818 

.49260 

1.2 

.48675 

.48092 

47511 

400.13 


45783 

45212 

.44644 

.41073 

.43516 

t.3 

.42956 

.42399 

41845 

41294 

40/47 

.40202 

SObOl 

.39123 

38589 

.38058 

1.4 

.37531 

-37007 

.36487 

.35971 

.35459 

.34950 

34415 

.33944 

.33417 

-32954 

1,5 

.32465 

JH980 

.31500 

31023 

.30550 

,30082 

29C18 

29158 

.23702 

.28251 

1.6 

,27804 

J273GI 

26923 

26489 

26059 

,25034 

252! 3 

21707 

,21385 

.23978 

17 

: .23575 

.23176 

.22782 

22392 

.22008 

.21027 

.21251 

.20879 

.20511 

.20148 

1.8 

.19790 

.19436 

19086 

.18711 , 

.18100 

.18004 

.17732 

.17404 

.170SI 

16762 

1.9 

.IC448 

.16137 

15831 

.155.30 

.15232 

14930 

.14050 

.14364 

.14083 

,13806 

2.0 

,13534 

.13265 

.13000 

.12740 

.1248.3 

.12230 j 

119.81 

.11737 

11496 

.11259 

2.1 

.11025 

.10795 

.10570 

.10347 

.10129 

.00914 I 

.09702 

.00495 

.09290 

.99090 

2.2 

.08892 

.08098 

08507 

0S320 

.081.30 

.07956 

,07778 

07004 

.074.33 , 

.07265 

2.3 

.07100 

.00939 

.00780 

.00024 

.06471 

.00321 1 

06174 

.06020 

0588S 1 

05750 

2.4 

,05014 

,05481 

05350 

05222 

05096 

j .04973 

,04852 

.04734 

.0-1618 

.04505 

25 

.04394 

,04285 

04179 

.04074 

03972 


i 03775 ' 

.03680 

.03566 

.03494 

2.6 

.03405 

03317 

.03232 

03118 

03066 

.02980 

; ^;2908 

.02831 

02757 

.02684 

2.7 

.02612 

02.542 

02474 

02468 

0234.3 


02218 

■0/4 Mr^ 

■OWI/KM 

.02040 

2.8 

.01981 

01929 

018^6 

.01.823 

01773 

.01723 

.01074 i 

01027 

.01581 

.01536 

29 

.01492 

01449 

1 .01408 , 

.01.367 

.01328 

.012^^ 


.01215 

.01179 

.01145 


2 

• 

0 

.1 

.2 1 

J 3 j 

.4 

.5 

.6 

.7 

.8 

• 

3. 

4. 

6. 

.01111 1 
.00034 1 
.00000 ; 

.00819 

.00022 

.00593 

.00015 

.00432 

.00010 

.00309 1 

00006 

00219 

.00004 

.00153 

00003 

! 

.ocdoo 

,00002 

.00073 1 
.00001 

I 

.00050 

.00001 


Largely from Rugg’e Statittfical Mfthvds Apohtd lo Bducf^^tlo^l. V y arrangement \Aith the 
puhltshers, Hoiighton Mifflin Company. More deiai!'’'! tahip}' of nor tiuiI- curve ordinates may 
be found in E. S. Poarson and H. O, Hartley, Exometrika Tables fur Statisticians, Volume I, 
Cambridge University Press, Cambridge, 1954, pp. 104—110; in Kn i Prar^jon, Tables for States- 
txciana and Biometriciana, Part T, The V 'versity Press, Cambridge, England, 194S (third edi- 
tion), pp. 2-8; and in Federal Works Agency. Work Projects Administration for the City of 
New York, Tables of Pt ohabxlity Functions. National Bureau of Standards, New York, 1942, 
Vol. 11, pp. 2-238. The valin's shown in these tables should be uiultipliecl by » 2.5066 to 

agree with those showm above. 
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APPENDIX E 


Areas Under the Normal Curve 

From the Arithmetic Mean to Distances* - or - from the Arithmetic 

a 

Mean, Kxpressed as Decimal Fractions of the Total Area 1.0000 


This table shows 
the black area: 
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4162 

llfl 

1.4 

.4192 

.4207 

.4222 

.4236 

.4251 

.4265 

.4270 

.4292 

.4306 

isei 

1.5 

.4332 

.4345 

.4357 

.4370 

.4382 

.4394 

.4406 

.4413 

.4429 

.4441 

1.6 

.4452 

.4463 

.4474 

.4484 

.4495 

.4505 

.4515 

.4525 

.4535 

.4545 

1.7 

.4554 

.4564 

.4573 

.4582 

.4591 

4599 

.4608 

.4010 

.4625 

.4633 

1.8 

.4641 

.4649 

.4656 

.4664 

.4671 

.4678 

.4686 

.4693 

.4609 

.4706 

1.9 

.4713 

.4719 

.4726 

.4733 

.4738 

.4744 

.4750 

.4756 

.4761 

.4767 

7.0 

.4772 

.4778 

.4783 

.4788 

.4793 

.4798 

.4803 

.4808 

.4812 

,4817 

2.1 

.4821 

.4326 

.4830 

.4834 

.4838 

.4842 

.4846 

.4850 

.48.54 

.4857 

2.2 

.4861 

.4864 

'.4868 

.4871 

,4875 

.4878 

.4881 

.4864 

.4887 

.4800 

2.3 

.4893 

.4890 

.4898 

.4901 

.4904 

.4906 

.4009 

.4911 

.4913 

.4916 

2.4 

.4918 

.4920 

.4922 

,4925 

.4927 

.4020 

.4931 

.4032 

.4934 

.4930 

2.6 

.4938 

.4040 

.4941 

-4943 1 

.4945 1 

.4946 

.4948 

.4949 

.4951 

.4953 

2.6 

.4953 

.4955 

.4956 

,4957 i 

.4959 1 

.4960 

.4961 

.4062 

.4963 

.4964 

27 

.4965 

.4066 

.4067 

.4968 

.4969 ! 

.4970 

.4971 

.4072 

.4973 

.4974 

3.8 

.4974 

.4975 

.4976 

.4977 

.4977 

.4973 

.4079 

.4979 

.4080 

.4981 

2.0 

.4981 

.4982 

.4982 

.4983 

.4984 

.4984 

.4085 

.4985 

.4986 

.4986 

8.0 

.49865 

.4987 1 

.4987 

.4988 

.4988 

.4089 

.4989 

.4989 

.4990 

.4900 

8.1 

3.2 

3.3 
34 
85 
36 

3.7 

3.8 

3.0 

4.0 
4.5 
50 

.49903 

.4993129 

.49951 G 6 

.4966631 

.4997674 

.4998400 

.4998922 

.4999277 

.4990519 

.4999683 

.4099966 

.4990997133 

.4901 

.4991 

.4991 

.4992 

.4992 

.4902 

.4092 

.4093 

.4093 


♦ The exprftMion - is uaed when fitting a normal curve (pp. 590-607) ; ^ w employed when 

making a te«t of significance involving the standard deviation of the population and the normal 
curve (pp. 63)5-642, 663-666, 676-671, 673-675, 679-680, and 723-726). 

Largely from Rugg's SlatUtwl Method» Applied to Education (with corrections), by arrange- 
ment with the publishers, Houghton Mifflin Company. A more detailed table of normal-curve 
areas, but in two directions from the arithmetic mean, is given in Federal Works Agency. Work 
Projects Administration for the City of New York, Tablec of ProbabUUy FuncOont, National 
Bureau of Standards. New York. IttiJ, VoL II, pp. 2-338, 
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APPENDIX F 


Values of F, Q 

For Use in Fitting Curves of the Type 



X 

.00 

.01 

.02 

.03 

.04 

.06 

.06 

-07 

.08 

.09 

,0 

.00000 

.00001 

00004 

.00009 

00016 

00025 

.00036 

.00049 

00064 

.00081 

.1 

.00099 

.00120 

.00143 

.00167 

.00104 

,00222 

.00253 

,00285 

00319 

.00355 

.2 

.00392 

.00432 

.00473 

.00510 

.00661 

.00607 

.00666 

.00705 

,00757 

.00810 

.3 

.00865 

.00921 

.00979 

1 01038 

01099 

.01161 

.01225 

.01290 

.01356 

.01424 

.4 

.01493 

01564 

.01635 

[ 01703 

01782 

• .01867 

.01933 

02011 

02089 

.02168 

.5 

.02248 

.02329 

! 02411 

.02494 

02578 

.02662 

* 02748 

.02833 

.02020 

.03007 

.6 

.0.3095 

1 .0318,3 

.03272 

03:161 

034.50 

.03540 

.03631 . 

.03721 

.03812 

.03904 

.7 

.03995 

.04086 

! 04178 

.04270 

04362 

04453 

.04546 

.04637 

.04728 

,04820 

.8 

.04911 

06002 

05093 

.06183 

-05274 

.05363 

.05463 

.05542 

.06031 

.05719 

.9 

.06806 

06894 

.06080 

OGOCiO 

00162 

.06236 

.00320 

06404 

.06486 

.06568 

1.0 

.06049 

0G729 

.06809 

.06887 

00965 

.07042 

.07118 

.07193 

.07267 

.07340 

1.1 

07412 

07483 

.07.552 

07621 

.07GS9 

.073 uG 

.07822 

.07880 

.07950 

.08012 

12 

.08073 

.03133 

,08192 

08250 

08306 

.08361 

' 03416 

.08-108 

.08520 

.08571 

1.3 

0&620 

.0S668 

.08715 

0.S700 

.0.S.S05 

.08848 

.08890 

08930 

.08970 

.09008 

1.4 

.09045 

.09080 

.09115 

.09148 

09180 

.09211 

1 .09241 

.09269 

.09296 

.00322 


09.347 

.09371 

.09394 

0941.5 

' 00435 

.094.54 

.09472 

.09489 

.09505 

.09519 

10 

.09533 

.00546 

09557 

09507 

! 09.377 

-09585 

.09592 

.09599 

.09604 

.09008 

1 7 

.09612 

09G14 

09G16 

.09010 

1 09610 

.09615 

.09013 

.09010 

1 .09606 

-09602 

1 ii 

09597 

O9500 ! 

095H4 

095 76 

i 095C3 

.095.59 

09549 

.09539 

.09627 

.09516 

1 0 

.09503 

.09490 

00477 

09463 

.00448 

09433 

.09417 

l)9-i01 

.09384 

.09366 

2 0 

09349 

.00330 

.00312 

09293 ! 

! 0Q273 

i .09253 

09233 

09213 

.09192 

.09170 

2 1 

.09149 

.09 127 

.09105 

09082 1 

1 OUOGO 

I ,00037 

.09014 

.08991 

.08067 

.08943 

22 

.08019 

.0889.'i 

08871 

088-17 ! 

1 .08823 

08798 

.08774 

.08749 

08724 

.08699 

2.3 

.08674 

.08050 

.08025 

08600 , 

OSS 75 

.08.550 

.08526 

.08500 

.08475 

.08450 

2.4 

.08426 

03401 

08376 

08362 

.08327 

.08303 

.08279 

.08265 

08231 

.08207 

26 

.08183 

.08159 

0813G 

08112 

.08089 

.08006 

08043 

.08020 

.07998 

07975 

2.0 

,07053 

.07031 ' 

.07909 

07888 

.07806 

07845 

.07824 

.07803 

.07782 

07762 

27 

.07742 

07722 

.07702 

07682 ! 

.07603 

,07644 

.07625 

07600 

.07688 

.07569 

2.8 

.07551 

.07634 

07616 

07499 

.07182 

.07465 

.07448 

.07432 

.07416 

.07400 

2.9 

30 

3 1 

3 2 

3.3 

3.4 

3.6 

36 

3 7 

3.8 

3.9 

4.C 

.07384 

.07240 

,07118 

.07016 

0G933 

OGSGG 

.0(.\813 

06771 

00739 

06714 

.06690 

06683 

.07369 

.07354 

07339 

.07324 

.07309 

,0;295 

! 

.07281 

.07267 

.07254 


From W. A. Shewhart, Ecojwmic Control of Quality of Manufactured Product^ p, 91, 
D. Van Nostrand Company, Inc., New York, 1931. Courtesy of D. Van Nostrand 
Company, Inc., and The Bell Telephone Laboratories. 


For values of F 2 beyond the range shown above, use the expression Fa 


Q \/ 211 
-ar* 


7^1' Thevalue, 


of e 2** may be conveniently read from the table of ordinates of the normal curve, 
Appendix D, or from a more extensive table in E. S. Pearson and H. O. Hartley, 
Biometrika Tables for Slatisticianst Volume I, Cambridge Uni\er8ity Press, Cambridge, 
1954, pp. 104 110, and in Karl Pearson, Tables for Statisticians and BiomeiricianSf 
Part /, The University Prejss, Cambridge, England, 1948 (third edition), pp. 2-8. 

The values for z shown in the last two tables yield e when multiplied by 2.5066. 
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APPENDIX G 


Areas in One Tail of the Normal 
Curve at Selected Values"* of - or - 

S CT 

from the Arithmetic Mean 


This table shows 
the black area: 




ior * 

S V 

.00 

.01 

.02 

.03 

.04 

.05 

.06 

.07 

.08 

-09 

0.0 

.5000 

.4960 

4920 

4880 

.4340 

.4801 

4761 

.4721 

.4681 

.4641 

0.1 

.4602 

4562 

.4322 

.4483 

4443 

.4404 

.4364 

.4325 

.4286 

.4247 

0.2 

4207 

4168 

.4129 

.4090 

.4052 

.4013 

3974 

.3936 

.3897 

.3859 

0.3 

.3821 

3783 

.3745 

.3707 

3669 

.3632 

.3594 

3557 

.3520 

.3483 

OA 

3446 

3409 

3372 

.3336 

3300 

.3264 

.3228 

3192 

.3156 

.3121 

0.5 

.3085 

.3050 

3015 

.2081 

,2946 

.2912 

.2877 

.2343 

.2810 

.2776 

0.6 

.2743 

.2709 

.2676 

2643 

2611 

.2578 

.2546 

.2514 

.2483 

.2451 

0.7 

.2420 

.2389 

.2358 

2327 

2296 


.2236 

.2206 

.2177 

.2r48 

0.8 

.2119 

.2090 

2061 

2033 

2005 

BiTfl 

1949 

.1922 

.1894 

.1867 

0.9 

1841 

.1814 

.1788 

1762 

.1736 

Bx il 

.1685 

.1660 

.1635 

.1611 

I.O 

.1587 

.1562 

.1539 

.1515 

1492 

.1469 

.1440 

.1423 

.1401 

.1370 

X.l 

.1357 

,1335 

.1314 

.1292 

.1271 

.1251 

1230 

.1210 

1190 

.1170 

1.2 

.1151 

1131 

.1112 

.1093 

1075 

1056 

1033 

.1020 

.1003 

.0985 

1.3 

.0968 

0051 

.0934 

0918 

.0901 

0885 

0809 

0853 

0838 

.0823 

# 1.4 

.0808 

1 

0793 

.0778 

0704 

0749 

0735 

.0721 

.0708 : 

.0694 

.0681 

1.5 

.0668 

.0655 

0043 

.0630 

.0618 

.0606 

.0594 

.0582 

.0571 

.0559 

1.6 

.0548 

0537 

0526 

0516 

0505 

0495 

.0485 

.0475 

.0405 

.0455 

1.7 

.0446 

.0436 

.045f7 

.0418 

.0409 

0401 

0392 

.0384 

.0375 

.0367 

1 8 

.0359 

.0351 

.0344 

.0336 

.0329 

0322 

.0314 

.0307 

0301 

.0294 

1.9 

.0287 

.0281 

.0274 

.0268 

.0262 

.0256 

.0250 

-0244 

.0239 

0233 

2 0 

.0228 

.0222 

.0217 

.0212 

.0207 

.0202 

.0197 

.0192 

.0188 

.0183 

2.1 

.0179 

.0174 

.0170 

.0166 

.0162 

.0158 

.0154 

0150 

.0146 

.0143 

2 2 

.0139 

Mima 

0132 

0129 

0125 

.0122 

.0119 

.0116 

0113 

.0110 

2.3 

.0107 

.0104 

0102 

.00990 


00939 

.C>0914 

.00889 

.00866 

.00842 

2.4 

.00820 


00776 

,00755 

,00734 

.00714 

.00695 

.00676 

.00657 

.00639 

2.5 

.00621 

.00604 

.005871 

.00670 

.00554 

.00539 

.00523 

.00508 

.00494 

.00480 

2.6 

.00466 

.00453 

00440 

.00427 

.00415 


.00391 

.00379 

00368 

.00357 

2.7 i 

.00347 

.00336 

.00326 

003171 


00298 

.00289 


.00272 

.00264 

2 8 

.00266 

00248 

00240 

00233 

.00226 

00219 

.00212 

00205 

.00199 

.00193 

2.9 

.00187: 

.00181 

00175j 

.001691 

.00164 


.00154 

.00149 

00144 

.00139 


*Of f 

8 V 

B 

B 

.2 

.3 

.4 1 

.5 

.6 

.7 


.9 

3 

.00135 

.0*968 

.0*687 

.0*483 

kBI 

.0*233 

.0*159 

.0*108 

.0*723 

.0*481 

4 

.0‘317 

.0*207 

.0*133 

.0*854 

.0*541 

.0*340 

.0*211 

.0*130 

.0*793 

.0*479 

5 

.0*287 

.0*170 

. 0^996 

.0’570 

.0^333 

.0*190 

.0*107 

.0*599 

.0»332 

.0*182 

0 

.0*987 

.0*530 

.0*282 

.0*149 


.0**402 

.0**206 

,0**104 

.0‘*623 

.0**260 


* 8«© note to Appendix E. 

Prom Table* of Areae tn Two Taila and in One Tati of the Normal Curve, by Frederick E. 
Croxtoo. Copy^bt, 1940. by Prentice-HaM. Ino. Fcrmtsvion ia given to reproduce tbia table 
provided credit Ib given to the author ond provided the Prentioo-HaU copyright line i« included. 
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APPENDIX H 


Areas in Two Tails of the Normal 
Curve at Selected Values* of - or - 

S or 

from the Arithmetic Mean 


This table shows 
the black areas: 



X X 

3 9 

1 

.00 

* .01 

.02 

! .03 

.04 

1 

! .05 

.06 

.07 

08 

.09 

0.0 

1.0000 

.9920 

.9840 

-9761 

0681 

9601 

i .0522 

i .9442 

.9362 

.9283 

0.1 

.9203 

.9124 

0045 

.8966 

.8887 

1 .8808 

i .8729 

j 8650 

.8572 

.3493 

0.2 

,8415 

.8337 

8259 

.8181 

8103 

8026 

.7949 

.7872 

.7795 

7718 

0.3 

,7642 

.7566 

7490 

.7414 

.7339 

7263 

.7188 

! 7114 

( .7039 

.6965 

0.4 

,6892 

.6818 

.6745 

.6672 

0599 

.6527 

.6455 

1 6384 

-6312 

.6241 

0.5 

1 ,6171 

.6101 

.6031 

.5961 

.5392 

.5823 

.5755 

.6687 

.5619 

.5552 

0.6 

1 .5485 

.5419 

.5353 

.5287 

-5222 

.5157 

.5093 

5029 

.4965 

.4902 

0.7 

.4839 

.4777 

.4715 

1 .4654 

.4593 

.4533 

.4473 

,4413 

.4354 

.4295 

vs A 

1 .4237 

.4179 

.4122 

! ,4065 

.4009 

.3953 

.3898 

.3843 

.3789 

.3735 

0.9 

,3681 

.3628 

.3576 

.3524 

.3473 

.3421 

; 3371 

.3320 

3271 

3222 

1.0 

.3173 

.3125 

-3077 

3030 

.2983 

.2037 

.2891 

1 

.2846 

.2801 

.2757 


.2713 

.2670 

.2627 

.2.585 

.2543 

2501 

.2460 

.2420 

.2380 

.2340 


.2301 

,2263 

.2225 

.2187 

.2160 

.2113 

i 2077 

.2041 

.2005 

.1971 


.1936 

.1902 

.1868 

.1835 

[ .1802 

.1770 

.1738 

1707 

,1676 

.1645 

^DEB 

,1615 

.1586 

.1556 

i .1527 

.1499 

.1471 

1443 

1416 

1380 

.1362 

1.5 

.1336 

,1310 

,1285 

.1260 

.1236 

.1211 

.1188 

.1164 

.1141 

.1118 

1.6 

,1096 

.1074 

.1062 

.1031 

.1010 

.0989 

.0969 

.0949 

.0930 

.0910 

1.7 

.0891 

.0873 

.0854 

.0836 

.0819 

.0801 

0784 

.0767 

.0751 i 

0735 

1.8 

.0719 

.0703 

.0688 

.0672 

.0658 

.0643 

.0629 

.0615 

.0601 

.0588 

1.9 

.0574 

.0561 

,0549 

,0536 

0524 

.0512 

.0500 

.0488 

.0477 

.0466 

2.0 

,0455 1 

.0444 

.0434 

.0474 

.0414 

.0401 ; 

! .0394 

.0385 

.0375. 

.0366 

2.1 

.0357 

.0349 

.0340 

.0332 

.0324 

0316 

i 0308 

0300 

.0293 

.0285 

2.2 

.0278 

.0271 

.0264 

.0257 

.0251 

.0244 

.0238 

.0232 

.0226 

0220 

2 3 

.0214 

.0209 

.0203 1 

.0198 

.0193 

0188 1 

.0183 

.0178 

0173 

0168 

2.4 

.0164 

.0160 

.01.55 

.0151 

.0147 

.0143 

.0139 

.0135 

.0131 

.0128 

2.6 

-0124 

.0121 

0117 

.0114 

.0111 

.0108 

.0105 

.0102 

.00988 

00060 

2.6 

.00932 

.00905 

.00379 

.00854 

.00829 

.00805 

.00781 

00759 

.00736 

.00715 

2.7 

.00693 

.00673 

00653 

.00633 

.00614 

.00596 

.00578 

.00561 

.00544 

.00527 

2.8 

.00511 

.00495 

.00480 

.00465 

.00451 

.00437 

.00424 

OOllO 

00398 

.00385 

2.9 

.00373 

.00361 

00350 

.00339 

.00328 

.00318 

.00308 

00298 

0028S 

00279 


Bor £ 

• 

.0 

.1 

.2 

.3 

.4 

.5 

,6 

.7 

.8 

,9 

3 

.00270 

.00194 

.00137 

.0>9v.: 


.0M65 

.0*318 

.0*216 

.0*145 

.0*962 

4 

.0*633 

.0*413 

,0*267 

.0*171 


.0*680 

.0*422 

.0*260 

.0*159 

.0*958 

5 

.0W3 

.0*340 

.0*199 

,0*116 

.0’666 

,0’380 

.0'214 

.0*120 

,0*663 

.0*364 

3 

.0*197 

.0*106 

.0*565 

,0*298 

.0*155 

.0»«803 

,0^*411 

.0**208 

0**105 

.0**520 


^ S«e note to Appendix E. 

From Table* of Area* in Tteo Tail* and in One Tail of the Normal Curt/e^ by Frederick E, 
Croxton. Conyright. 1940, by Prentice-Hall, Inc. Permbsion is given to reproduce this table 
provided credit is given to the author and provi led the Preotice-HaU copyright line is included. 
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APPENDIX 

Values 

Fo** Given Decrees or Freedom (n) and 


This table shows the black 


Level of significance jP) 


n 

.90 

.80 

.70 

.60 

.50 

.40 

.30 

.25 

1 

. 158 

.325 

.510 

.727 

1.000 

1 - 376 

1.963 

2-414 

2 

. 142 

.289 

.445 

.617 

.816 

1.061 

1-386 

1.604 

3 

.137 

.277 

.424 

.584 

.765 

.978 

1 .250 

1.423 

4 

.134 

.271 

.414 

.569 

.741 

.941 

1.190 

1.344 

5 

. 132 

.267 

.408 

.559 

.727 

.920 

1.156 

1.301 

6 

.131 

.265 

.404 

.553 

.718 

.906 

1.134 

1.273 

7 

. 130 

.263 

402 

.549 

.711 

896 

1 . 119 

1.254 

8 

. 130 

262 

.399 

.646 

.706 

.889 

1. 108 

1.240 

O 

. 129 

.261 

-398 

.543 

.703 

.883 

1 . lOO 

1 . 230 

lO 

. 129 

.260 

.397 

.542 


.879 

1 . 093 

1.221 

11 

. 129 

.200 

.396 

- 540 

.697 

.876 

1.088 

1.214 

12 

. 128 

.259 

305 

.539 

.695 

.873 

1.083 

1.209 

13 

. 128 

250 

394 

.538 

.694 

.870 

1 .070 

1 . 204 

14 

. 128 

.258 

.393 

.537 

,692 

.808 

1 076 

1.200 

15 

.128 

258 

.393 

.536 

.691 

.806 

1 . 074 

1.197 

16 

. 128 

.258 

.392 

-535 

.690 

.865 

1 .071 

1.194 

17 

, 128 

257 

-392 

.534 

-689 

.863 

1 .069 

1 . 191 

18 

. 127 

257 

392 

.534 

.688 

. 862 

1.067 

1 . 189 

19 

. 127 

257 

.391 

,533 

-088 

.861 

1 06<i 

i . 187 

20 

. 127 

, 257 

391 

1 

.533 

.687 

.860 

1 .064 

1 . 185 

21 

. 127 

257 

391 

.532 

,686 

.859 

1 063 

1 . 183 

22 i 

127 i 

256 

.390 

532 

.686 

.858 

1.001 

1 . 182 

23 : 

127 i 

.256 

.390 

.532 

.685 

.858 

1 . OGO 

1 . 180 

24 

. 127 i 

256 

390 

.531 1 

- 685 

.857 

1 . 059 

1. 179 

25 

. 127 

.256 

- 390 1 

.531 

.684 

.850 

1.058 

1.178 

26 

. 127 

256 

390 

.631 i 

,684 

.856 

1 . 058 

1.177 

27 

127 

256 

.389 

531 

.684 

.855 

1.057 

1.176 

28 

. 127 

. 256 

.380 

.530 

.683 

.855 

1 -050 

1.175 

29 

, J27 

256 

- 389 

,530 

,683 

.854 

1 , 055 

1.174 

30 

. 127 

256 

.389 

.630 

.683 

.854 

1 .055 

1.173 

40 

. 126 ; 

.255 

.388 

.629 

.681 

.851 

1.050 

1.167 

60 

. 126 

254 

.387 

.527 

.679 

.848 

1 .046 

1.162 

120 

. 126 

.254 

.386 

.526 

,677 

.845 

X .041 

1.156 

ao 

. 126 

.253 

. 385 

.624 

.674 

.842 

1 .036 

1.160 


Xb© valued in tins table were taken, by permi.-diojri, from Sta-tistical 
Torhlea for StolofficKtl, At/rxcultural, and Medical Reacarch, by H,. A. Piaher 
at-d F'. Yales, published by Oliver and JBoyd. Edinbursb, and from Bio^ 
metrxkia, Vol. XXXII. April 1042, p. 300. ’ Table of Percentage Points of 
tbe f-distrib ition," by Maxine Merrinston. A table of t, similar in 
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I * 

of t 

at Specified Levels of Significance (/*) 


areas : 


Level of s ignil icauc o {P) 


.20 

.10 

.05 

.025 


.01 

.005 

.001 

n 

3.078 

6.314 

12.706 

25 - 452 

31 821 

63 . 057 

127 32 

636 619 

1 

1.886 

2.920 

4 303 

6.205 

6 . 965 

9 025 

14.089 

31 .598 

2 

1.638 

2.353 

3.182 

4.176 

4.541 

5.841 

7 . 453 

12.941 

3 

1.533 

2.132 

2.776 

3.495 

3 . 747 

4 604 

6.698 

8.610 

4 

1.476 

2.016 

2.571 

3.163 

3.365 

4.032 

4.773 

6.859 

6 

1.440 

1.943 

2 447 

2.969 

3.143 

3.707 

4.317 

6.959 

6 

1.415 

*1 . 895 

2.365 

2.841 

2.998 

3.499 

4.029 

5.405 

7 

1.397 

1 860 

2.306 

2 . 752 

2 896 

3.355 

3.832 

5.041 

8 

1.383 

1.833 

2.262 

2 . 685 

2 821 

3 250 

3.690 

4.781 

9 

1.372 

1.812 

2-228 

2 634 

2.764 

3.169 

3.581 

4.587 

10 

1.363 

1.796 

2.201 

2.593 

2.718 

3.106 

3.497 

4.437 

11 

1.366 

1.782 

2.170 

2 560 

2 681 

3 055 

3 428 

4.318 

12 

1.350 

1.771 

2. 160 

2.533 

2 650 

3 012 

3 372 

4.221 

13 

1,345 

1.761 

2.145 

2.510 

2 624 

2 977 

3 326 

4.140 

14 

1.341 

1.753 

2, 131 

2.490 

2 . 602 

2.947 

3 . 286 

4.073 

16 

1.337 

1,746 

2. 120 

2.473 

2.583 

2 921 

3.252 

4 016 

16 

1.333 

1.740 

2.110 

2.458 

2.567 

2 808 

3 . 222 

3.965 

17 

1.330 

1 1 734 

2 101 

2 - 445 

2 552 

2.878 

3 197 

3.922 

18 

1.328 

1.729 

2 093 

2.433 

2 539 

2 861 

3.174 

3 883 

19 

1.325 

1-725 

2.086 

1 2.423 

2 528 

2.845 

3-153i 

3.850 

20 

1.323 

1.721 

2 080 

2.414 

2.518 

2 831 

3.135 

>.819 

21 

1.321 

1 717 

2.074 

2 406 

2 508 

2 819 

3 119 

^ . 792 

22 

1.319 

1.714 

2 069 

2 398 

2 500 

2 807 

3, 104 

3 767 

23 

1.318 

1.711 

2 064 

2 391 

2 . 492 

2 797 

3 . 09C 

3 . 745 

24 

1.316 

1.708 

2 . 060 

2.385 

2.485 

2.787i 

3.078 

3-725 

25 

1.315 

1.706 

2.056 

2 379 

2 479 

2.779 

3.067 

3.707 

26 

1.314 

1.703 

2.052 

2 373 

2 473 

2.771 

3.056 

3.690 

27 

1.313 

1 701 

2 018 

2.368 

2 467 

2.763 

3 047 

3,674 

28 

1.311 

1 699 

2 045 

2 . 364 

2 . 462 

2.756 

3.038 

3.659 

29 

1.310 

1.697 

2 042 

2.360 

2 . 457 

2 . 750 

3.030 

3.646 

1 

30 

1.303 

1 684 

2.021 

2.329 

2.423 

2.704 

2.971 

3.651 

40 

1.296 

1.671 

2.000 

2 299 

2.390 

2 660 

2 915 

3 . 4G0 

60 

1.289 

1 .658 

1 . 980 

2 270 

2.358 

. 617 

2.860 

3 37.3 

120 

1.282 

1.645 

1 960 

2 . 241 

2.326 

2.576 

2.807 

3.291 

oo 


arrangement to that of Appendix K, f^ivinK areas of the t distribution from the 
moan to t (in one direction) and for n -* 1 to n — 20 may be found in "New 
Tables for Testing the Significance of Observations," by "Student," Metron, Vol, 
V. No. 3 (1926). pages 114-118. 
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APPENDIX 


Valued 

For Given Degrees of Freedom 


This table shows 
the black area*. 


for 71 =■ 1 and n ■«= 2, 



V'alue of P 


n 

1 .909 

i 995 1 

i . 

,99 


,08 1 

L •' 

97.5 1 

1 95 1 

1 .00 

1 SO 


75 

1 -70 1 

..50 

1 1 


0*157 


0‘393 


b»157 


OHVigI 

0*982 


.00393 


0l58j 


0642 


.102 


.148 

.455 

2 1 


.00200 


0100 


0201 


.040-* 


0506 


103 


211 


.446 


575 


713 

1 386 

a 


.0243 


0717 


115 


185 


21C 


.352 


684 i 

1 

005 

1 

213 

1 

424 

2 366 

4 


0008 


207 


.297 


429 


484 


.711 

1 

064 

1 

649 

1 

923 

2 

195 

3 357 

5 


.210 


.412 


.554 


.752 


.831 

1 

145 

1 

610 

2 

343 

2 

.675 

3 

000 

4,351 

6 i 


381 


676 


872 ’ 

1 

134 

1 

237 

1 

635 

2 

204 

3 

070 

3 

455 

3 

828 

5 348 

7 


598 


.989 

1 

239 

1 

564 

1 

690 j 

2 

167 

2 

833 : 

3 

822 

4 

255 

4 

671 

6.346 

8 1 


.857 

1 

344 

1 

646 

2 

032 

2 

180 

2 

733 

3 

400 ' 

4 

594 

5 

071 

5 

527 

7 344 

9 1 

1 

.152 

1 

735 

2 

088 

2 

532 

2 

700 j 

3 

325 

4 

.168 

5 

380 

5 

899 

6 

303| 

8 343 

10 1 

1 

479 

2 

156 

2 

558 

3 

059 

3 

247 ! 

3 

940 

4 

805 

6 

. 179 

6 

.737 

7, 

.2671 

9 342 


1 

834 

2 

603 

3 

053 

3 

609 

3 

810 

4 

57.5 

5 

578 

6 

989 

7 

584 

8 

148 

10 341 

12 

2 

214 

3 

074 

3 

671 

4 

178 

4 

404 

5 

226 

6 

304 

7 

807 

8 

438 

9 

034 1 

11 340 

13 

2 

617 

3 

50.5 

4 

107 

4 

765 

5 

009 

5 

892 

7 

042 

8 

634 

9 

299 

9 

920 

12 340 

H 

3 

041 

4 

0?5 

4 

060 

5 

308 

5 

629 

6 

571 

7 

790 

9 

405. 

10 

165 

10 

,821 

13.339 

15 

3 

,483 

4 

601 i 

5 

229 

5 

085 

6 

262 

7 

261 

8 

547 I 

10 

307 J 

11 

036 

11 

721 

14 330 

16 

3 

942 

5 

M2 

5 

812 

. 6 

614 

6 

908 

7 

962 

9 

312 

11 

1.52 

11 

912 

12 

624 

15 338 

17 

4 

416 

5 

097 

6 

408 

7 

255 

7 

564 

8 

072 

10 

0S5 

12 

002 

12 

792 

13 

531 

16.338 

18 

4 

905 

P 

265 

7 

015 : 

7 

906 

S 

231 

9 

.390 

10 

865 

1 2 

857 

1.1 

075 

14 

440 

17 338 

10 

5 

.407 

6 

844 

7 

633 j 

6 

.507 

8 

007 

10 

117 

n 

651 

13 

710 

14 

562 

15 

3.52 

18 338 

20 

5 

921 

7 

434 

8 

200 j 

9 

237 

9 

591 

10. 

,851 

12 

443 

14 

578 

15 

452 

10. 

200 

19.337 

21 

6 

447 

8 

034 

8 

807 

9 

915 

10 

2.83 

11 

591 

13 

240 

15 

445 

16 

344 

17. 

,182 

20 337 

22 

6 

.983 

h 

043 

9 

542 ! 

10 

COO 

10 

982 

[12 

338 

14 

041 

10 

314 

17' 

210 

18 

101 

21 337 

23 

7. 

, 529 

0 

2 CO 

10 

196 

11 

293 

11 

OSK 

■13 

091 

14 

818 

17 

187 

18 

137 

19 

021 

22 337 

24 

8 

.085 

9 

886 

10 

856 

11 

992 

12 

401 

13 

848 

lo 

0.59 

18 

0C2 

19 

0.37 

19 

943 

23 337 

2.5 

, 8 

6«19 

! 

520 

n 

524 

12 

097 

13 

120 

14 

Oil 

lb 

473 

18 

940 

19 

939 

20 

807 

24 337 

20 

0 

222 

11 

160 

12 

198 

n 

409 

1.3 

844 

15 

379 

17 

292 

19 

820 

20 

843 

21 

702 

25 336 

27 

9 

.803 

11 

808 

12 

879 

14 

12.5 

14 

,>73 

16 

1.51 

118 

114 

20 

703 1 

21 

749 

22 

719 

26 336 

28 

10 

391 

12 

401 

13 

56.=! 

14 

847 

15 

308 

lo 

928 

18 

939 

21 

588 

22 

657 

23 

647 

27 336 

20 

10, 

.986 

' 13 

121 

14 

2.50 

15 

571 1 

10 

047 

117 

708 

Il9 

708 

22 

475 

23 

567 

21 

577 

28 336 

.30 

11. 

..588 

13 

787 

M 

953 

10 

300 i 

1 16 

791 

!i8 

493 

l20 

599 

23 

304 

24 

478 

2.5 

.508 

20 336 


For value’? ol n > 30. approxituate vahn**? for x’ be obtained from the expres‘<ion 



where ^ is the normal deviate rutting off the rorrespoiidin^ tails of a liurmal distiibulion. If - is taken 
<T 

at the 0.02 level, so that 0.01 of the normal dintfibution is in each tail, the expression jieidjb X* ** the 
0 99 and O.Ol points. For very large value': of n, it is sufficiently net urate to compute the dis* 

tributioD of which is appioximntely noruiai around a mean of V^n — 1 and with a standard deviation 

of 1. 
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for n ^ 3, 


Value of P 


.30 

.25 

.20 

_ 

.05 

.026 

.02 

.01 

.005 1 

.001 

It 

1.074 

1. 

323 

1 

642 

2 

706 

3 

841 

5 

024 

6 

412 

6. 

535 ! 

7. 

870 

10 

827 

1 

2.408 

2. 

773 

3 

210 

4 

605 

5 

001 

7 

378 

7 

824 

9 

2101 

10. 

507 

13 

815 

2 

8 665 

4. 

108 

4 

642 

6 

251 

7 

815 

0 

348 

9 

837 

11 

345 

12. 

838 

16 

268 

8 

4.878 

5. 

385 

5 

980 

7 

779 

9 

488 

11 

143 

11 

068 

13 

277 

14. 

860 

18 

465 

i 

6.064 

6. 

626 

7 

280 

9 

236 

11 

070 

12 

832 

13 

388 

16 

086 

16. 

760 

20. 

517 

5 

7.231 

7 

841 

8 

558 

10 

645 

12 

502 

14 

449 

15 

033 

16 

812 

18 

648 

22 

457 

0 

8.38C 


OiW 

9 

803 

12 

017 

14 

067 

16 

013 

16 

622 

18. 

475 

20 

278 

24 

322 

7 

9.524 

lo! 

210 

11 

030 

13 

362 

15 

507 

17 

535 

IS 

168 

20 

090 

21 

955 

26 

125 

8 

10.656 

11 

380 

12 

242 

14 

684 

16 

919 

19 

023 

19 

679 

21 

666 

23 

589 

27 

877 

9 

11,781 

12. 

549 

13 

442 

15 

987 

18 

307 

20 

483 

21 

161 

23. 

209 

25 

IS 8 

29 

588 

10 

12 890 

13. 

701 

14 

631 

17 

275 

19 

676 

21 

020 

22 

618 

24 

726 

26 

767 

31 

264 

11 

14 011 

14 

845 

16 

812 

18 

540 

21 

026 

23 

337 

24 

054 

26 

217 

28 

300 

32 

909 

13 

15.119 

15. 

984 

16 

085 

19 

812 

22 

362 

24 

736 


472 

27 

688 

29 

819 

34 

528 

13 

16.222 

17 

117 

18 

151 

21 

064 

23 

685 

26 

119 

26 

873 

29 

141 

31 

319 

36 

123 

14 

17.322 

18 

245 

19 

311 

22 

307 

24 

996 

27, 

488 

28 

259 

30 

678 

32 

801 

37 

697 

15 

18.418 

19. 

369 

20 

465 

23 

542 

26 

296 

28 

845 

20 

633 

32 

000 

34 

267 

30 

252 

16 

,19.511 

20 

489 

21. 

015 

24 

769 

27 

587 

30 

191 

1 30 

995 

33 

409 

35 

.718 

40 

790 

17 

20 GOl 

21. 

605 

22 

760 

25. 

980 

28. 

860 

31. 

526 

! 32 

340 

34 

805 

37 

156 

42 

.312 

18 

21 680 

22 

718 

23 

.900 

27 

204 

30 

144 

32 

852 

1 33 

.687 

36 

191 

38 

582 

43 

.820 

10 

22.776 

23 

828 

25 

.038 

28. 

412 

31. 

.410 

34 

170 

1 35. 

020 

37. 

566 

^9 

.997 

45 

.315 

20 

23.858' 

24 

935 

20 

.171 

29 

616 

32 

671 

35 ’ 

479 

30 

343 

38 

932 

1 

401 

46 

.797 

21 

24.939 

26 

.039 

27 

301 

30 

813 

33 

024 

36 

781 

37 

659 

40 

289 

A'c 

790 

48 

268 

22 

26.018 

27 

. 141 

28 

429 

32 

007 

35 

172 

38 

076 

38 

968 

41 

638 

44 

181 

49 

728 

23 

27.090 

28 

241 

29 

.553 

33 

196 

36 

415 

39 

364 

40 

270 

42 

98C 

45 

558 

61 

179 

24 

28.172 

20 

.339 

30 

675 

34 

382 

37 

.652' 

40 

646 

41 

566 

44 

.314 

46 

.928 

62 

,620 

25 

29.246 

30 

434 

31 

.795 

35 

563 

38 

885 

41 

.923 

42 

.856 

45 

.642 

48 

290 

64 

052 

26 

30.310 

31 

.528 

32 

012 

36 

741 

40 

113 

43 

194 

44 

140 

'46 

.963 

49 

645 

55 

.476 

27 

31.391 

32 

.620 

34 

027 

37 

QIC 

41 

337 

44 

461 

45 

410 

48 

278 

60 

.093 

56 

803 

28 

32.461 

33 

.711 

35 

.139 

30 

.087 

42 

,557 

45 

722 

46 

693 

49 

588 

62 

336 

58 

.302 

29 

33.530 

34 

.800 

36 

250 

40 

250 

43 

.773 

46 

.979 

47 

902 

50 

802 

53 

672 

50 

.703 

30 


Thie table is taken by consent from Table IV of Statistical Tables for Biological, Agricultural^ 
and Medical Research, by R. A. Fisher and F. Yates, published by Oliver and Boyd. Edinburgh; 
from Biometrika, Vol. 32. pp. 187-191, “Table of Percentage Points of the X* Distribution.*' by 
Catherine M. Thompson; and from Biometrika, Vol. 40, p. 421, “99.9 and 0.1 % Points of the x* 
Distribution,” by T. Lewis. The values shown in Mias Thompson's table fand the values at 
the 0.001 point as well) may also be found in E. S. Pc.., son and H, O. Hartley, Btomsfrifco Tables 
for Statisticians, Volume 1, Cambridge University Press, Cambridge, 1954, pp. 130-131. 
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APPENDIX 


Values of ^ for Use in Determining* 


This table shows 
the blAck areas; 





Lower poinU I 

.50 

n 

.001 

.005 

.01 

.025 

.05 

.10^ 

_ 

kBHiii 

.0*157 

.0*3927 

.0*1571 

.0*9821 


■■■niK vTi 


.4549 


.001000 

.005013 

.01005 

.02532 




.6931 


.008099 

.02391 

.03823 

.07193 




.7887 


.02270 

.05175 

.07428 

.1211 

.1777 

.'fl 


.8392 . 


.04304 

.08235 

.1109 

.1662 

.2291 

.3221 


.8703 


.06351 

1126 

.1453 

.2062 

.2726 

.3674 


.8914 


.08550 

1413 

.1770 

.2414 

. 3096 

. 4047 


.9065 



.1681 

.2058 

.2725 

.3416 

. 4362 


.9180 


.1280 

.1928 

.2320 

.3000 

.3695 

.4031 


.9270 


.1479 

.2156 

.2558 

.3247 

.3940 

.4865 

.0737 

.9342 

11 

.1667 

.2367 

.2776 

.3469 

.4159 

.5071 

.6895 

.9401 

12 

.1845 

.2562 

.2975 

.3070 

.4;155 

.6253 

.7032 

.9450 

13 

2013 

.2742 

.3159 

.3853 

.4532 

.5417 

.7153 

,9492 

14 

.2172 

.2910 

.3329 

.4021 

.4693 

. 5564 

7201 

.9528 

15 

.2323 

,3067 

.3486 

.4175 

.4841 

,5698 


.9559 

16 

.2464 

.3214 

.3633 

.4317 

.4976 

. 5820 


. 9587 

17 

.2598 

.3,351 

.3769 

.4450 

.5101 

. 6932 

. 752.5 

.9611 

18 

,2725 

.3480 

.3897 

.4573 

5217 

6036 

.7597 

.9632 

19 

.2846 

.3602 

.4017 

.4688 

5325 

.6132 

.7664 

.9651 

20 

.2961 

.3717 

.4130 

. 4795 

.5425 

.6221 

.7726 

.9669 

21 

,3070 

,3826 

, .4237 

.4897 

.5520 

6305 

.7783 

.9684 

22 

.3174 

.3929 

.4337 

,4092 

.5608 

.6382 

.7836 

.9099 

23 

.3274 


4433 

. 5082 

..5092 

.6456 

.7886 

.9712 

24 

.3309 " 

.4119 

.4524 

.5167 

.5770 

.6524 

.7932 

.9724 


3460 


.4610 

.5248 • 

.5845 

.6589 

.7076 

.9735. 

26 

.3547 

.4292 

.4692 

.5325 

.5915 

.6651 

.8017 

.9745 

27 

.3631 

4373 

.4770 

.5398 

.5982 

.6700 

.8055 

.97.54 

28 

.3711 


.4845 

.5467 

.6046 

.0764 

.8092 

.9763 

29 

.3788 

.4525 

.4916 

.5533 

.6106 

.6816 

.8126 

,9771 

30 

3863 

.4596 

.4984 

.5597 

.6164 

6866 

.8159 

.9779 


4479 

.6177 

.5541 

.6108 

6627 

.7263 

.8415 

.9834 


.4935 

. 5598 

.5941 

.6471 

.6953 

,7.538 

.8588 

.9867 



.5922 

6247 

.6747 

.7198 

.7743 

.8716 

.9889 


5577 

.6182 

.6492 

.6965 

.7391 

.7904 

.8814 

.9905 

80 

.5815 

.6396 

.6092 

.7144 

.7549 

.8035 

.8893 

.9917 

90 

6017 

.6577 

.6862 

.7294 

.7681 

.8143 

.8958 

.9926 

100 

0192 

.6733 

,7006 

.7422 

.7793 

.8230 

.9013 

.9933 

flO 

1 0000 

1.0000 

1.0000 

1,0000 

1.0000 

1.0000 

1.0000 

1.0000 

1h 

9 

-3 0902 

-2.5758 

-2 3263 1 

- 1 . 9600 

-1 6449 

-1.2816 * 

1 

- .6745 

0 


♦ Wten n > 30, valuer of — may be approximated by use of the expreasion 







K 


Sampling Limits of 


and 



Upper pointa 


.25 



.025 

.01 

.005 

HHESmH 

n 

1.323 

2.706 

3 841 

5 024 

6.035 

7.879 

10.827 

1 

1.3d6 

. 2 303 

2.996 

3 689 

4.605 

5 298 


2 

1.3G9 

2 084 

2 605 

3 116 

3 782 

4.279 

5 423 

3 

1.346 

1.945 

2.372 

2.786 

3.319 

3.715 

4 616 

4 

1.325 

1.847 

2 214 

2 566 

3,017 

3.350 

4 103 

5 

1.307 

1.774 

2 099 

2 408 

2 802 

3,091 

3 743 

6 

1.291 

1 717 

2 010 

2.288 

2 639 

2.897 

3.475 

7 

1.27 V 

1 670 

1 938 

2.192 

2.611 

2.744 

3 266 

8 

1.265 

1 632 

1 880 

2 114 

2.407 

2.621 

3.097 

g 

1.255 

1.590 

1 831 

2.048 

2 321 

2 519 

2 959 

10 

1.246 

1 570 

1.789 

1 993 

2.248 

2 432 

2 842 

11 

1.237 

1 546 

1 752 

1 945 

2 185 

2.358 

2 742 

12 

1.230 

1 524 

1 720 

1.903 

2.130 

2 294 

2 656 

13 

1.223 

1 505 

1 692 

1.860 

2.082 

2 237 

2 580 

14 

1.216 

1.487 

1.660 

1.833 

2.039 

2 187 

2.513 

15 

1.2U 

1 471 

1 644 

1 803 

2.000 

2.142 

2 453 

16 

1.205 

1 457 

1 623 

1.776 

1 965 

2 101 


17 

1.200 

1.444 

1 604 

1 751 

1 934 

2 064 

2 351 

18 

1.196 

1.432 

1 586 

1.729 

1 905 

2.031 

2 306 

19 

1.191 

1.421 

1 571 

1 708 

1 878 

2 000 

2.266 

20 

1.117 

1 410 

1 550 

1 689 

1 854 

1.971 

2 228 

21 

1.114 

1 401 

1 542 

1.672 

1.831 

1.945 

2 194 


ft 

1 392 

1.529 

1 655 1 

1 810 

1 921 

2.102 


^ 1.177 

1 383 

1.517 

1 640 

1 791 

' 1.898 

2.132 


1.174 

1 375 

1 506 


1 773 

1 877 

2 105 


1,171 

1.368 

1 496 

1 612 

1.765 

1 857 

2.079 


1 168 

1.361 

1.486 


1.739 

1 839 

2 055 


1.165 

1.354 

1 476 

1.588 

1 724 

1.821 

2.032 


1.162 

1.348 

1 467 

1 577 

1 710 

1.805 

2 010 


1.160 

1.342 

1 459 

1 666 

1.696 

1.789 

1 990 

^ESB 

1.140 

1.295 

1.394 

1 484 

1.592 

1.669 

1 835 


1.127 

; 1.263 

1 350 

1 428 

1 623 

1 590 

1.733 

50 

1.116 

1.240 

1 318 

1.388 

1.473 

1.533 

1.660 

60 

1.108 

1 222 

1.293 

1 357 

1 435 

1,489 

1 605 

70 

1.102 

1 207 

1 273 

1 333 

1.404 

1.454 

1 . 560 

SO 

1.096 

1.195 


1.313 

1 379 

1.426 

1 525 

90 

1.091 

1.185 


1 296 

1.358 

1.402 

1.494 

100 

1.000 

1 000 


1 000 

1 1.000 

1 000 


« 

+ .6745 

H- 1.2816 

+ 1.6449 

+ 1.960'’ 

+2.3203 

+2.6758 

+ 3.0002 

2* 

o 


where ~ ia the norinal deviate cutting off the corresponding tail of a normal distribution. 

<r 

'^'he values in this table were computed from values of x’ given in the references mentioned in Appendix Ji 
X* 

by use of the expression - — o*. 
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APPENDIX 


Values of for Use in Determining 


This table shows 
the black areas: 



Lower limits 1 

.50 

n 

.001 

.005 

.01 

.025 

.05 

.10 

.25 

1 

.0924 

.1269 

.1507 

.1990 

2C03 

.3096 

.7557 

2.198 , 

2 

1448 

.1887 

.2171 

.2711 

.3338 

.4.343 

72)3 

1.443 

3 

.1844 

.2337 

.2644 

.3209 

.3839 

.4799 

7302 

1 208 

4 

2166 

.2692 

.3013 

.3590 

4216 

.5142 

7428 

1.192 

6 

.2437 

.2985 

.3314 

.3896 

.4517 

.5413 

.7546 

1.149 

6 

2672 

.3235 

.3560 

.4152 

.4765 

.5637 

.7652 

1.122 

7 

2878 

.3452 

.3789 

.4372 

.4076 

.5825 

.7746 

1.103 

8 

.3062 

.3644 

.3082 

.4502 

.5159 

.5087 

.7829 

1.089 

9 

3223 

.3815 

.4154 

.4731 

.5319 

.6129 

.7903 

1.079 

10 

.3380 

.3970 

.4300 

.4882 

.5462 

.6255 

.7909 

1.070 

11 

3518 

.4111 

.4449 

.5018 

.5591 

.6368 

.8029 

1.064 

12 

.3646 

.4240 

.4577 

.5142 

.6707 

.6469 

.8083 

1 058 

13 

.3765 

.4360 

4695 

.5256 

.5813 

.6502 

.8133 

1.054 

14 

3876 

.4470 

.4804 

.6360 

.6911 

.6046 

.8170 

1 050 

15 

3979 

.4573 

.4900 

.5457 

.6001 

,6724 

.8221 

1.040 

10 

.4076 

.4669 

. 5000 

.5647 

.6085 

.6796 

.8261 

1.043 

17 

.4168 

.4759 

.5088 

.5631 

.0162 

.6803 

.8297 

1.041 

18 

4254 

.4844 

' .5172 

.6710 

.6235 

.6926 

.8331 

1.038 

19 

.4336 

.4925 

.6260 

.5783 

.6303 

.6984 

.8363 

1 036 

20 

.4414 

.5000 

.6324 

.5853 

.6367 

.7039 

.8394 

1.034 

21 

.4487 

.5072 

.5394 

.5919 

.6428 

.7091 

.8422 

1.033 

22 

.4558 

.5141 

.5460 

1 .5981 

.6485 

.7140 

. 8449 

1.031 

23 

.4025 

.5206 

,5524 

.6041 

6539 

.7180 

.8474 

1.03r 

24 

4089 

5208 

.5584 

.6097 

.6501 

.7230 

.8498 

1.028 

25 

.4751 

.5327 

,6642 

.6151 

.6640 

,7271 

.8521 

1.027 

26 

.4810 

.5384 

,5697 

.8202 

.6688 

.7311 

.8543 

1.026 

27 

.4867 

.5439 

.5749 

.6251 

.6731 

.7349 

.8564 

1 025 

28 

4022 

.5401 

.5800 

.6298 

.6774 

.7385 

.8584 

1 024 

29 

.4974 

5542 

.6848 

.6343 

.6814 

.7419 

.8003 

1 023 

30 

5025 

.5590 

.5895 

.6386 

.6854 

.7452 

8631 

1 023 

40 

,5449 ! 

.5991 

.6280 

.0741 

.7174 

.7721 

.8769 

1,017 

50 

.5770 

.6200 

.6566 

.7001 

.7407 

.7910 

.8876 

1.013 

60 

.6024 

.6525 

.6789 

.7203 

.7587 

8005 

.8958 

1 011 

70 

. 0232 

.6717 

.6970 

.7367 

.7732 

.8185 

9023 

1 010 

80 

.6408 

.6878 

,7122 

.7503 

.7852 

.8283 

, .9077 

1.008 

00 

,6559 

.7015 

.7251 

.7618 ; 

.7954 

.8367 

.9123 

1 007 

100 

.6691 

.7134 

.7363 

.7718 

.8042 

.8439 

.9162 

1.007 

to 

1,0000 

X.OOOO 

1.0000 

1.0000 

1 0000 

1.0000 

1.0000 

1.000 

u 

w 

+3 0902 

+2.5758 

+2.3263 

+ 1.9600 

+ 1 6449 

+ 1.2816 

+ .6745 

0 



♦ When n > 30. values of may be approximated by use of the expression 
o> 




L 

I 

Confidence Limits of ff® 



Uppor liuiita 


25 

.10 

.0.5 

025 

.01 

.005 

.001 

n 

9 849 

G3 328 

234 32 

1,018 3 

6,306 0 

25,465 

637,000 

1 

J 3 476 

• 9 401 

10 496 

39 498 

99 501 

199 51 

999 50 

2 

2.474 

5.134 

8 520 

13 002 

20 125 

41 820 

123 47 

3 

2 081 

761 

5 023 

8 2,57 

13 4G3 

19 325 

44 051 

4 

1 869 

if . isjo 

4 30.5 

6 015 

9.020 

12 144 

23 . 785 

5 

1 737 

2 722 

3 609 

4 849 

6.S80 

8 879 

15 745 

6 

1 G45 

. 2 471 

3 230 

4 142 

5 650 

7 070 

11 696 

7 

1 578 

2 293 

2 928 

3 070 

4 859 

5 051 

9 334 

8 

1 526 

2.159 

2 707 

3 333 

4,311 

5 188 

7 813 

9 

1 484 

2 055 

2 538 

3.080 

3 909 

4.639 

6.762 

10 

1.450 

1 9f2 

2 404 

2 883 

3 G02 

4 220 

5 098 

11 

1 422 

1 904 

2 206 

2 725 

3 301 

3 904 

5 420 

12 

1 398 

1 846 

2 206 

2 595 

3 105 

3 617 

4 007 

13 

1.377 

1 797 

2 131 

2 487 

3 004 

3.436 

4 604 

14 

1 350 

1 755 

2 066 

2 3S5 

2 868 

3 200 

4 307 

15 

1 343 

1 718 

2 010 

2.316 

2 753 

3'Ul 

4.050 

16 

1 320 

1.686 

1 900 

2 247 

2 653 

2 984 

3 850 

17 

1.316 

1.657 

1 917 

2 187 

2 506 

2 873 

3 670 

18 

X . 305 

1.331 

1 878 

2 133 

2 489 

2 776 

3 514 

19 

i, 294 

1 607 

1 843 

2 085 

2 421 

2.090 

3.378 

20 

1 .85 

1 586 

1 812 

2 042 ’ 

2 300 

2 614 

3.257 

21 

. i 276 

1 567 

1.783 

2 003 

2 305 

2 545 

3 151 

: 22 

A 1 268 

1 549 

1 757 

1 908 

2 256 

2 484 

3 0.55 

23 

1 261 

1 .533 

1 733 

1.935 

2.211 

2 428 

2 909 

24 

1 254 

1 513 

1.711 

1 906 

2.109 

2 370 

2 890 

25 

1 247 

1 504 

1 691 

1 878 

2 131 

2 330 

2 819 

26 

1 241 

1 491 

1 672 

1,853 

2 097 

2 287 

2 751 

27 

1 236 

1 478 

1 0.54 

1 829 

2 004 

2 247 

2 695 

28 

1 231 

1 407 

1 038 

1 807 

2 034 

2 210 

2 G40 

29 

1.226 

1 456 

1 622 

1.787 

2 000 

2 170 

2 5S9 

30 

1.188 

1 377 

1 500 

1 637 

1 805 

1 932 

2 23.3 

40 

1.164 

1 , 327 

1 438 

1 545 

1 083 

1.780 

2 026 

50 

1.147 

1 291 

1 389 

1 482 

1 601 

1.C88 

1 890 

GO 

1 135 

1 265 

1.353 

1 436 

1.540 

1.018 

1 793 

70 

1.124 

1 245 

1.325 

1 400 1 

1.494 

1,503 

1.720 

80 

1.116 

1.228 

1 302 

1 371 

1.457 

1.520 

1 662 

90 

1.109 

1 214 

1.283 

1.347 1 

1 427 

1 485 

, 1.015 

100 

1.000 

1.000 

1 000 

1.000 

1.000 

1 000 

1 000 

«e 

.6745 1 

-1.2816 

-1.6449 

-1.9600 

-2.3263 

-2,5758 

-3.0902 

9 


' Vhcre is the corresponding normal deviate. 

A 

^ The values in this table were computed from values of x* «»vec in the references mentioned in Appendii J, 


by use of the expression 9* 


X* 


9 *. 
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APPENDIX M 

Values of F 

For Given Degrees of Freedom (m and n 3 } and at Selected Upper Points 
Values of F for corresponding lower points may be obtained by transposing the 
values of 7ii and n* and computing 


This table shows 
the black areas' 




[ ni - 1 



n, « 

2 


Wj 

.10 


.025 

.01 

.001 

.10 

BjRjQjjH 

025 

.01 

HHISSQHII 

1 

39 864 

161 45 

647 79 

4.052 2 

405,2.84 

49 500 

199 50 


1.999 5 


2 

8 526 

18.513 

38.506 


998 5 




99 000 

999.0 

3 

5 538 

10 128 

17 443 

34.116 

167.0 

S 462 

9 552 

16 044 


14S 5 

4 

4 545 


12.218 

21 . 198 

74 14 

4.325 

6.044 

10 649 

IS 000 

61 25 

5 

4 060 

6.608 


16.258 

47 18 

3.780 

5.786 

8 434 

13 274 

37 12 

6 

3 776 

5.987 

8.813 

13.745 

35.51 

3,463 

5 143 

7,260 

10.925 

27 00 

7 

3,5S9 

5 591 

8 073 

12 246 

29.25 

3 257 

4.737 

6 542 

9 547 

21 6'l 

8 

3 438 

5 316 

7 571 

11.259 

25 42 

3 113 

4 459 

6 060 

8 649 

18.49 

9 

3 360 

5.117 

7.209 


22 86 

3 006 

4 256 

6 715 

8 022 

18 39 

10 

3.285 

4.965 

6.937 

10.044 


2 024 

4.103 

5.456 

7 559 

H 91 

U 

3 225 

4.844 

6.724 

9 646 

19 69 

2.S60 

3.982 

5,256 

7 206 

13 81 

12 

3.176 

4 747 

6 554 


18 64 

2.807 

3 885 

5 096 

. 6 927 

12 97 

13 

3.136 

4.667 

6.414 

9.074 

17 81 

2 763 

3.806 

4.965 

6 701 

12.31 

14 

3.102 

4 600 

6.298 

8.862 

17.14 

2.726 

3.739 

4 857 

6 515 

11.78 

15 

3 073 

4,543 

6.200 

• 8.683 

16 59 

2.095 

3 682 

4.765 

6.359 

11.34 

15 

3.048 

• 4 494 

6.115 

8.631 

16 12 

2.068 

3.634 

4.687 

6 226 

10.97 

17 

3 026 

4.451 

8 042 


16 72 

2.645 


4 619 

6.112 

10.68 

18 

3 007 

1.414 

5 978 

6 285 

15 38 

2.624 

3.555 

4 560 

6 013 

10.39 

19 

2 9y0i 

4 381 

5 922 

8. 185 

15 08 

2 606 

3.522 

4.508 

5.926 

10.16 

20 

2.975 

4 351 

5.872 


14 82 

2.589 

3 493 

4.461 

6.849 

8.95 

21 

2 961 

4 325 

6 827 

8.017 

14.59 

2 575 

3.467 

4 420 

5.780 

9.77 

22 

2 949 

4 301 

5.786 

7.945 

14 38 

2 561 

3 443 

4.383 

5 719 

9.61 

23 

2 937 

4 279 


7.881 

14.19 

2.549 

3.422 

4.349 

5.664 

9.47 

24 

2 027 

4.200 

6.717 

7.823 

14.03 

2 538 

3.403 

4 319 

5.614 

9.34 

25 

2.91S 

4.242 

6.686 

7.770 

13.88 

2.528 

3.385 

4.291 

5.568 

9.22 

26 

2.909 

4.225 

5.659 

7.721 

13.74 

2.519 

3.369 

4.266 

5.526 

9.12 

27 

2.901 

4 210 

6.633 

7.677 

13.61 

2 511 

3.354 

4.242 

5 488 

9.02 

28 

2.694 

4.196 

6.610 

7.636 


2.603 

3.340 

4.220 

5. 453 

8.93 

29 

2.887 

4.183 

6.688 

7.698 

13.39 

2.496 

3.328 

4 201 

5.421 

8.85 

30 

2.881 

4,171 

6.668 

7.663 

13.29 

2.480 

3.316 

4.182 

5.300 

8.77 

40 

2.835 

4.085 

6.424 

7.314 

12.61 

2.440 

3.232 


5.178 

8.25 

60 

2.791 

4.001 

6.286 

7.077 

11.97 

2.393 

3.150 

3.925 

4.977 

7.75 

120 

2.748 

3 920 

6.152 

6.851 

11.38 

2.347 

3.072 


4.786 

7 32 


2.706 

3.841 

5.024 

6.635 

10.83 


2.996 

3.689 

4.605 

8 91 


Values of F at the 0.10, 0.05, 0.025, and 0.01 points were taken, by periuiBaion, from Bwrn^ika, Vol, XXXlll, 
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APPENDIX mr—Continued 

Values of F 

For Given Degrees of Freedom (wii and m) and at Selected Upper Points 

Values of F for corresponding lower points may be obtained by transposing the 

1 

values of Til and and computing 





m =*“ 

3 




Hi - 

4 


na 

.10 

■EH 

wEm 

.01 

.001 

.10 

.05 

.025 

.01 

.001 

1 

53 593 

215.71 

864.16 

5.403 3 

540.379 

55 833 

224 58 

899 58 

5,621 6 

562,500 

2 

0 162 

19 164 

39.165 

99. 166 

999 2 

9 243 

19 247 

39 248 

99 219 

999 2 

3 

5 391 

9.277 

15.439 

29.457 

141 1 

5 343 

9 U7 

15 101 

28 710 

137.1 

4 

4 191 

6.591 

9 979 

16 694 

50,18 

4 107 

6 388 

9.604 

15 977 

53 44 

5 

lAjljJ 

5.410 

7 754 

12.000 

33.20 

3.520 

5.192 

7 388 

11.392 

31.09 

6 

3.2b« 

4 757 

6 599 

9.779 

23.70 

3.181 

4.534 

6.227 

9 148 

21.92 

T 


4.347 

5 890 

8 451 

18.77 

2 960 

4.120 

5.523 

7.847 

17 10 

3 

2.924 


5 410 

7.591 

15.83 

2,808 

3 838 

5.053 

7.006 

14 39 

9 

2.813 


5.078 

6 992 

13.90 

2 693 

3 633 

4 718 

6.422 

12.56 

10 

2.728 

KEQi 

4.826 

6 552 

12.55 

2 605 

3 478 

4.468 

5.994 

31.28 

11 


3 5S7 

4.630 

6.217 

11.56 

2 536 

3.357 

4 275 

5.668 

10.35 

12 

2 606 

3 190 

4 474 

5.953 

10.80 

2 480 

3 259 

4.121 

5 412 

9.63 

13 

2 660 

3 410 

4 347 

5 739 

10 21 

2 434 

3 179 

3.90G 

5 205 

9 07 

14 

2.522 

3.344 

4 242 

5 564 

9.73 

2 395 

3 112 

3.892 

5.035 

8.62 

IS 

2.490 

3.287 

4.153 

6 417 

9.34 

2 361 

3 C56 

3.804 

4.893 

d.25 

16 

2,482 

3.239 

4 077 

5.292 

9,00 

2.333 

3.007 

3.729 

4.773 

7.91 

17 

2 437 

3.197 

4 011 

5.185 

8,73 

2 308 

2.965 

3.665 

4.669 

7.68 

18 

2 416 

3.160 

3 954 

5 092 

8 49 

2.296 

2.928 

3.608 

4 579 

7.46 

19 

2 397 

3.127 

3.903 

5 010 

8.28 

2.260 

2 895 

d 559 


7.26 

20 

2 380 

3 098 

3.859 

4.938 

8. 1C 

2 249 

2. 866 

3 515 

4.431 

7.10 

2! 

2.365 

3,072 

3.819 

4.874 

7,94 

2 233 

2.840 

S 475 

4.369 

6 95 

22 

2.351 

«WifCl 

3 783 

4.817 

7,80 

2 219 

2.817 

3 440 

4.313 

6.61 

23 

2.339 

3.028 

KKj 

4.765 

7 67 

2 206 

2.795 

3.408 

4.264 

6.69 

24 

2 327 

3.009 

3 721 

4.718 

7.55 

2 195 

2.773 

3.379 

4.218 

6.59 

25 

2.317 


3.694 

4,670 

7.45 j 

2 184 

2 769 

3.353 

4.177 

6.49 

26 

2. 308 

2 975 

3 670 

4 637 

7.36 

2.174 

2 743 

3.329 

4.140 

6.41 

27 

2 299 


3.647 

4.601 

7.27 

2.166 

2.728 

3 307 

4.106 

6.33 

28 

2 291 

2 917 

3 626 

4 568 

7.19 

2 157 

2 714 

3,286 

4 074 

6 25 

29 I 

2 283 

2 934 

3 607 

4.538 

7.12 

2 149 

2.701 

3 267 

4.045 

6.19 

30 ! 

2.276 

2.922 

3 589 

4.510 

7.05 

2.142 

2.690 

3 250 

4 018 

6.12 

40 

2.226 

2 839 

3.463 

4.313 

6.60 

2.091 

2 606 

3 126 

3.828 

5.70 

60 

2.177 

2 758 

3.342 

4.120 

6.17 

3 041 

2 525 

3 008 

3.649 

5.31 

120 



3.227 

3.940 

5 79 

1.992 

2.447 

2.894 

3.480 

4.95 


2.034 

1 2 605 

3.116 

3.782 

5.42 

1 945 

2 372 

2.786 

3.319 

4.62 


April 1943, pp. 73-73, “Tables of Percentage Points of U. Inverted Beta (F) Distribution," by Maxine Merrington 
and Catherine M- Thornpeon. Values of F at the 0.001 point were taken from Table V of R. A. Fisher and F. Yatea, 
<3taft«lu;af Tables for Biologural, AgrteuUural, and Meduxil Research^ Oliver and Boyd, Ltd., Edinburgh, 1049, by 
permission of the authors and publishers. The tables which originally appeared in Biomelrika may be found also 
in E. S. Pearson and H. O. Hartley, Biometrika Tables for Statxdicwns, Volume I, Cambridge University I’ress, 
Cambridge, 1054, pp. 157-163. This source provided fourteen corrections for the values at the 0.001 point. 
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APPENDIX M— Continued 

Values of F 

For Given Degrees of Freedom (rii and nj) and at Selected Upper Points 

Values of F for rorresponding lower points may be obtained by transposing the 
values of ni and and computing ~* 





ni - 

6 




111 « 

9 


n * 

.10 

KJil 

.025 

.01 

.001 

.10 

.05 

.025 

.01 

.001 

1 

57.241 

230.16 

921.85 

5,763.7 

570,405 

68.204 

233 99 

937 11 

5,859 0 

585,937 

2 

9.293 

19.296 

39 298 

99.299 

999.3 

9 326 

19 330 

39 331 

99 332 

099.3 

8 


9.014 

14.885 

28.237 

134 6 

5.285 

8.941 

14.735 

27.911 

132.8 

4 

4.051 

6.256 

9.364 

15 522 

51.71 

4 010 

6 163 

9 197 

15 207 

50.53 

5 

3.453 

5.050 

7.146 

10.967 

29.75 

3.404 

4.050 

6.978 

10.672 

29.84 

6 

3.10 B 

4.387 

5.988 

8.746 

20 81 

3 055 

4 284 

5.820 

8 4 G 6 

20 03 

7 

2.833 

3.972 

5 285 

7.460 

16.21 

2 827 

3.866 

5.119 

7.191 

15 52 

8 

2 726 

3.688 

4.817 

0.632 

13 49 

2 608 

3.581 

4.652 

6 371 

12 86 

2 

2.611 

3.482 

4.484 

6.057 

11.71 

2.551 

3 374 

1.320 

5 802 

11.13 

10 

2.522 

3.326 

4.236 

5.636 

10.48 

2 461 

3.217 

4.072 

5 386 

9.92 

It 

2.451 



6 316 

9.68 

2 380 

3 095 

3.881 

5 009 

9.05 

12 

2.394 

3.106 

3.891 

5 064 

8 89 

2 331 

2 996 

3 728 

4 821 

8 38 

13 

2.347 

3.025 

3.767 

4.862 

8.35 

2 283 

2 015 

3.604 

4.620 

7.86 

14 

2 307 

2.958 

3.663 

4.695 

7.92 

2 243 

2.848 

3 501 

4.456 

7.43 

15 

2.273 

2.901 

3.576 

4.556 

7.57 

2 208 

2 790 

3 415 

4.318 

7.09 

18 

2.244 

2.852 

3.502 

'4.437 

7.27 

2 178 

2.741 

3.341 

4.202 

6 81 

17 

2.218 

2.810 

3.438 

4.336 

7.02 

2 152 

2.699 

3 277 

4 102 

6.56 

18 

2 196 

2,773 

3.382 

4.248 

6.81 

2.130 

2 661 

3 221 

4 0151 

0.35 

19 

2 176 

2 740 

3.333 

4.171 

6 62 

2-109 

2.628 

3.172 

3.939 

0 18 

20 

2.158 

2.711 

3.289 

4.103 

6 40 

2.091 

2.599 

3 128 

3.871 

6.02 

31 

2.142 

2.685 

3.250 

4.042 

0.32 

2.075 

2.573 

3.090 

3.812 

5.88 

32 

2.128 


3 215 

3 988; 

0.19 

2 060 

2 549 

3 055 

3.758 

5.76 

23 

2.115 

2 640 

3.184 

3.939 

0.08 

2.047 

2 52 S 

3 023 1 

3.710 

5.65 

24 

2.103 

2.621 

3.155 

3.895 

5.98 

2.035 

2.508 

2.995 

3.667 

6.55 

25 

2 092 

2.603 

3.129 

3.855 

5.88 

2.024 

2.490 

2.069 

3 627 

5.40 

26 

2 082 

2 587 

3.105 

3.818 

5.80 

2 014 

2.474 

2.945 

3.591 

5.38 

27 

2.073 

2.572 

3.083 

3.785 

5.73 

2 004 

2.459 

2 023 

3. 558 

5.31 

28 

2.054 

2 558 


3 754 

5 06 

1 996 

2 445 

2 903 

3 528 

5.24 

39 

2.057 

2 545 

3.044 

3.725 

5.59 

1 988 

2.432 

2.884 

3.409 

5.18 

30 

2.049 

2 534 

3.026 

3 609 

5.53 

1 980 

2.421 

2.887 

3.474 

6.12 

40 

1.997 

2.450 

2.904 

3.514 

5.13 

1 927 

2 336 

2 744 

3.291 

4.73 

60 

1 046 

2 368 

2. 786 

3.339 

4.76 

1 875 

2.254 

2.627 

3.119 

4.37 

120 

1.896 

2 290 

2 674 

3.174 

4.42 

1 824 

2 175 

2 515 

2 956 

4,04 

•0 

1.847 

2.214 

2.566 

3 017 

4 10 

1 774 

2 099 

2.408 

2.802 

3.74 
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APPENDIX m— Continued 


Values of F 

For Given Degrees of Freedom (m and n^) and at Stdccted Upper Points 

Values of F for corrcaponding lower points may be obtained by transposing the 
values of n, and and computing j,- 





til “ 

8 


n, -12 


.10 

.05 

.025 

.01 

.001 

.10 

05 

.025 

.01 

.001 

1 

69.439 

238.88 

056 66 

5,981 6 

598.144 

60 705 

213.91 

BU, 71 

6.106 3 

610.667 

2 

9.367 

19.371 

39 373 

99 374 

999 4 

9 408 

19 41.3 

39 415 

99 416 

990.4 

3 

5 252 

8.845 

14.540 

27 489 

130 8 

5 216 

8.745 

14 337 

27.052 

128.3 

4 

3.955 

6.041 

8.080 

14 799 

49.00 

3 896 

5 912 

8.751 

14 374 

47.41 

6 

• 

J.R18 

6.757 

10.280 

27 64 

3 268 

4.678 

6 525 

9.888 

36.42 

6 

2.9S3 

4.147 

6.600 

8 102 

19.03 

2 905 

4 000 

5.366 

7.718 

17.99 

7 

2.752 

3.726 

4 899 

6 840 

14 63 

2 668 

3 675 

4 666 

6 469 

13.71 

8 

2.589 

3.438 

4 433 

6 020 

12 04 

2 502 

3 284 

4 200 

6.067 

11.10 

9 

2 469 

3.230 

4 102 

5 467 

10 37 

2 370 

3 073 

3 868 

6.111 

9.57 

10 

2,3:7 

3.072 

3 655 

5 067 

9 20 

3 334 

2 913 

3 621 

4.706 

8.45 

11 

2.304 

2.949 

3 664 

4.746 

8.35 

2 200 

2 788 

3 430 

4 397 

7.63 

12 

2 245 

2. 849 

3.612 

4.499 

7.71 

2 147 

2 687 

3.277 

4 155 

7,00 

i3 

2 195 

2 707 

3.388 

4 302 

7 21 

2 097 

2 604 

3 153 

3 960 

6.52 

14 

2,154 

2.699 

3 285 

4.140 

6.80 

2 054 

2 534 

3,050 

3.800 

6.13 

15 

2.118 

3.641 

3.199 

4.004 

6.47 

2 017 

2.475 

2 963 

3 666 

6.81 

16 

2 ORS 

2.591 

3 125 

3 890 

6.19 

1 9S5 

2.425 

2.889 

3.553 

6.55 

17 

2.061 

2.548 

3.061 

3 701 

6 96 

1 958 

2 381 

2 825 

3 455 

6.32 

IS 

2.038 

2.510 

3.005 1 

3 705' 

5.76 

1 933 

2 342 

2 769 

3 371 

6.13 

19 

2 017 

2.477 

2.0501 

3 631 

5.69 

i 012 

1 2 308 j 

2.720 

3 206 

4.07 

20 

1.998 

2.447 

2 913: 

3.564 

5 44 

1 892 1 

1 2.278! 

2.C76 

3.231 

4.83 

21 

1 982 

2.421 

2 874 

3.506 

6.31 

1 875 

2 250 

2.637 

3.173 

4.70 

22 

1.967 

2 397 

2 839 

3 453 

5 10 

1 559 

2 226 

3 602 

3 121 1 

4.68 

23 

I 953 

2 375 

2.803 

3.408 

6.00 

1.845 

2 204 

2 670 I 

3 074 

4.48 

24 

1 941 

2.355 

1 2 770 

3,363 

4.09 

I 833 

2 183 

2.541 1 

3 032 

4.39 

25 

1.029 

2.337 

2.753 

3.324 

4.91 

1.820 

2 165 

2.615 1 

2.993 

4.31 

26 

1 910 

2.321 

2,729 

3 2SS 

4.83 

1 800 

2 US' 

2 491 

2.958 

4.24 

27 

1 909 

2.305 

2 707 

3 256 

4.76 

1 790 1 

2 132 

2 480 

2 926 i 

4,17 

28 

1 900 

2,291 

2.687 

3 226 

4 69 

1 790 

2 118 

2 448 

2 896 

4 n 

29 

1 892 

2.278 

2.669 

3,193 

4.04 

1.781 

2 104 

2.430 

2 869 

4.05 

30 

1.884 

2.266 

2.651 

3.173 

4.68 

1.773 

2 062 

2.412 

2.843 

4 00 

40 

1.829 

2.180 

2 529 

2.993 

4.21 

1 716 

2.0041 

1 

2.288 

2.665 

3.64 

60 

1 775 

2 097 

2.412 

2.823 

3 87 

1 057 

1 917 

2.169 

2.498 

3.31 

120 

1.722 

3.016 1 

2 290 

2 663 

3.55 

1.601 

1 834 

2.065 

2.336 

3 02 


1.670 

1.938 

2.102 

2 511 

3.2V 

1.546 

1.752 

1.945 

2 185 

2.74 
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APPENDIX Mr-Concluded 


Values of F 

For Given Degress of Freedon (rii and n2) and at Selected Upper Points 
Values of F for eorreaponding lower points may be obtained by transposing the 
values of n\ and Ut and computing p- 


nj 



n. 

24 





fi, « 

CC 


.10 

.05 

025 

.01 

•001 

.10 

.05 

.025 

.01 

■ .001 

] 

62 002 

249 05 

997 25 

6.234 6 

623.497 


63 328 

254 32 

1,018.3 

a. 366 0 

036.619 

2 

9 450 

10.454 

39.456 

99 45.S 

999 

5 

9.491 

19 496 

39 49'8 

99 501 

999.5 

3 

6 176 

8.638 

14 124 

26 598 

125 

9 

5 134 

8.527 

13 902 

26 125 

123.5 

4 

3.S31 

5 774 

8 511 

13 929 

45 

77 

3 761 

5.628 

8.257 

13 463 

44 05 

5 

3 190 

4.527 

6 27a 

9 467 

25 

14 

3 105 

4.365 

6.015 

9 020 

» 23 79 


2.S18 

3.S41 

5 117 

7 313 

16 

89 

2 722 

3,669 

4.849 

6 880 

15 75 

7 

2.575 

3.410 

4 415 

6 074 

12 

73 

2 471 

3 230 

4.142 

5 650 

U.70 

8 

2.404 

3 115 

3 947 

6 270 

10 

30 

2 293 

2 928 

3 670 

4 859 

9 33 

9 

2.277 

2 900 

3 614 

4.729 

8 

72 

2,159 

2 707 

3 333 

4 311 

7.81 

10 

2.178 

2 737 

3 365 

4.327 

7 

61 

2 055 

2 538 

3 080 

3 909 

6.70 

n 

2.100 

2 600 

3.172 

4.021 

6 

85 

1 972 

2.405 

2 883 

3 602 

6 00 

12 

2.036 

2.505 

3 019 

3 780 

6 

25 

1.904 

2 296 

2.725 

3 361 

5 42 

13 

1 9S3 

2.420 

2 893 

3 587 

5 

78 

1.846 

2 206 

2 596 

3 165 

4 97 

14 

1 938 

2 340 

2 789 

3.427 

5 

41 

1 797 

2 131 

2 487 

3,004 

4 GO 

15 

! 899 

2 28fi 

2 701 

. 3 294 

5 

10 

1.755 

2.066 

2 395 

2 868 

4 31 

16 

1.803 

. 2 235 

2, 525 

3.1SI 

4 

85 

1 718 

2.010 

2.316 

2 753 

4 06 

17 

1 836 

' 2 190 

1 2 560 

3.083 


61 

i 1 6S0 

1 960 

2 247 

2 6.W 

3 85 

18 

I 810 

2 150 

! 2 503 

2 909 

4 

45 

I 65 7 

1 917 

2 187 

2 566 

3 67 

19 

1 1 787 

! 2 114 

2.452 

2 925 

4 


1 631 

1 1 .878 

3 i33 

2 489 

3 51 

20 

1. 767 

i 2 083 

[ 

2 408 

2.859 

4 

15 

1-607 

i 1 843 

2 085 

2 421 

3 38 

21 

1.718 

! 2 051 

2 368 

2 801 

, 4 

03 

1 586 

1 812 

2.042 

2 360 

3 26 

22 

1 731 

2 02« 

2 332 

2 719 

3 

92 

1 567 

1 783 

2.003 

2 305 

3 15 

23 

I 716 

2 005 

2.299 

2.702 

3 

82 

1 519 

1 757 

1.96S 

2 256 

3 05 

24 

1 702 

1 9S4 

2 269 

2.659 

3 

74 

1 5331 

1.733 

1 935 

2 211 

2 97 

25 

1.689 

1.964 

2 242 

2.620 

3 

66 

1 518 

1 711 

1.906 

2 160 

2 89 

26 

1 677 

1.9i6 

2 217 

2 585 

3 

59 

1.504 

1 691 

1.878 

2 132 

2 82 

27 

1 G66 

1 930 

2 195 

2 552 

3 

52 

1.491 j 

1.072 

1.853 

2 096 

2.75 

28 

1 65C 

1.915 

2 174 

2 522 

3 

.46 

1 478 

1 654 

1 829l 

2.064 

2 69 

29 

J 646 

1.001 

2.154 

2 495 

3 

41 

1.467 

1 638 

1.807 

2 031 

2 64 

30 

1 63.8 

1 S87 

2.136 

2 469 

3 

36 

1 456 

1.C22 

1 787 

2 006 

2 59 

40 1 

1.574 

1.793 

2 007 

2 288 

3 

01 

1 377 

1.509 

1.637 

1 805 

2 23 

60 

1 511 

1.700 

1.S82 

2 115 

2 

69 

1 292 

1.389 

1 482 

1 601 

1 89 

120 

1 447 

1 60S 

1 760 

1.950 

2 

40 

1 193 

1.254 1 

1 310 

1 380 

1 54 

ce 

1.383 

1.5171 

1.6401 

1.701 

2 

13 

1.000 

1 000 ! 

1 000 

1 000 

1 00 
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APPENDIX N 


Values of £ at the 0.05 and 0.01 Points ‘for Specified Values 
of l\i and /c, when Ni = N 2 — • = TVjt ~ iV« 


If L haa been computed from samples of varying size, take A\ equal to 


A" 1 ■+' Ni -f- 


•+- Nr, 


- > provided that no sample consists of fewer than 15 or 20 items. 


This table shows 
the black area: 




N, 

- 3 

AT. 

- 4 


- 5 

N. 

- 6 

N . 

7 

N. 

« 8 

AT. 

- 0 


.05 

.01 

.05 

.01 

.05 

.01 

.05 

,01 

.05 

.01 

.05 

.01 

.05 

.01 

2 

312 

.141 

478 

.284 

585 

.398 

656 

.486 

70.S 

.551 

745 

.603 

.775 

.646 

3 

304 

.168 

.470 

.314 

576 

.429 

648 

.614 

. 700 

.678 

-739 

.628 

.769 

.687 

4 

315 

.188 

.480 

.345 

585 

.459 

6.56 

.542 

707 

.604 

744 

.662 

774 

.689 

5 

328 

.210 

.491 

.370 

695 

.484 

665 

.565 

714 

.624 

.751 

.670 

780 

.706 

6 

.339 

.230 

,502 

.391 

,G04 

.604 

.673 

.683 

721 

.641 

.757 

.685 

.78.5 

.720 

7 

3;VJ 

.246 

512 

,409 

012 

.620 

680 

.697 

727 

.664 

763 

.697 

790 

,730 

8 

359 

.260 

.620 

.424 

620 

.534 

.686 

.810 

. 7.3.3 

.665 

.768 

.707 

795 

.740 

9 



527 

.437 

.626 

.646 

.691 

.620 

.738 

.674 

.772 

.715 

.798 

.747 

10 

. a74» 

.;kS4 

.634 

.448 

OJl 

.665 

.696 

.629 

.742 

.652 

.776 

.722 

.802 

.763 

12 

387 

.303 

.545 

.467 

641 i 

.672 

.704 

.644 

749 

.696 

782 

.734 

.807 

.764 

14 

.397 

.316 

. 554 

.431 

649 

.686 

711 1 

.655 

755 

.706 

.787 

.744 

.812 

.773 

16 

405 

.331 

501 

.493 

6.55 ; 

.696 

716 1 

.665 

.759 

.714 

.791 

.751 

.816 

.779 

18 

.412 

.343 

667 

.604 

660 

.606 

721 

.672 

703 

.721 

.795 

.756 

819 

.784 

20 

418 

.352 

573 

.512 

665 

.613 

725 

.679 

.767 

.727 

.798 

.761 

.822 

.788 

22 

.424 

.360 

. .577 

.620 

669 

.619 

728 

.684 

770 

.732 

.800 

.765 

.824 

.792 

24 

428 

.367 

581 

.626 

672 

.624 

731 

.668 

772 

.736 

802 

.768 

.826 

.796 

26 

433 

.373 

585 

.632 

675 

.629 

.734 

.693 

,775 

.740 

.805 

.772 

823 

.798 

28 

437 

.379 

589 

.637 

.678 

.634 

7.36 

.697 

.777 

.744 

807 

.776 

.829 

.808 

30 

441 

.386 

.592 

.643 

681 

.639 

.739 

.703 

779 

.748 

800 1 

.781 

831 

.80S 



N ^ 

»= 10 

N. 

« 12 


"• 15 

iV. « 20 


« 30 

N. 

*= 60 

1 .V. - - 


.05 

.01 

.05 

.01 

.05 

.01 

.05 

.01 

.05 

.03 

.05 

.01 

.05 

.01 

2 

798 

.678 

833 

.730 

868 

.783 

.902 

.836 

935 

.820 

968 

.946 

1 OOC 

1.000 

3 

792 

.699 

828 

.748 

863 

.798 

.898 

.848 

9.33 

.893 

967 

.949 

1 000 

1.000 

4 ‘ 

.797 

.719 

8.32 

,766 

860 

.812 

900 

.869 

9:u 

.906 

.967 

.953 

1 000 

1.000 

6 

802 

.736 

836 

.779 

,870 

.823 

. 903 

.867 

.936 

.911 

968 

.966 

l.OOO 

1.000 

6 

8C»8 

.748 

841 

.789 

873 

.832 

.906 

.874 

938 

.916 

969 

.968 

1 000 

1.000 

7 

.812 

.767 

844 

.798 

876 

.839 

.908 

.879 

. 939 

.920 

970 

.960 

1 000 

1.000 

8 

.816 

.766 

848 

.806 

879 

.844 

910 

.884 

,941 

.923 

.971 

.962 

1 000 

1.000 

0 

.810 

.773 

851 

.811 

881 

.849 

.912 

.887 

.942 

.929 

.971 

.963 

1 000 

1.000 

10 

822 

.779 

853 

.816 

883 

.863 

.913 

.890 

.943 

.927 

.972 

,964 

1 000 

i.OOO 

12 

828 

.789 

857 

.824 

887 

.860 

916 

.896 

.944 

.931 

973 

.966 

l.OOO 

1.000 

14 

832 

.796 

|.86l 

.831 

890 

.866 

918 

.900 

[ 046 

.933 

.973 

.967 

1 000 

1.000 

16 

835 

.802 

863 

.836 

892 

.870 

.920 

.903 

1 .947 

.936 

.974 

.968 

l.OOO 

1.000 

18 

.838 

.807 

866 

.840 

894 

.878 

.921 

.906 

.948 

.937 

.974 

.969 

1 000 

1.000 

20 

.840 

.811 

868 

.844 

.896 

.876 

.922 

.908 

.949 

.039 

975 

.970 

1.000 

1.000 

22 

843 

.814 

870 

.847 

.897 

.878 

,924 

.909 

.950 

.040 

.075 

.970 

1.000 

1.000 

24 

.844 

.817 

872 

.860 

.898 

.880 

924 

.911 

950 

.941 

.975 

.971 

l.OOO 

1.000 

20 

846 

.820 

873 

.862 

899 

.882 

925 

.912 

.951 

.042 

976 

.971 

1.000 

1.000 

28 

.848 

.823 

.874 

.864 

.900 

.884 

926 

.914 

.951 

.943 

.976 

.972 

1 000 

1.000 

30 

840 

.827 

.870 

.866 

.901 

.886 

.927 

.916 

.952 

.944 

.076 

1 .972 

1 000 

1.000 


Baaed on a tabic in “An Investigation Into the Application of Neyman and Pear- 
son’s Lx Test, with Tables of Percentage Limits/’ by P. P. N. Nayer, Statistical 
Research Memoirs, Vol. I (103G), pp. 38-51, by perrai.ssinn of the author. An earlier 
table of the same nature is given in “Tables for the Applieation of L-Tcsts,” by 
P. C. Mahalanobis, Sankhya: The Indian Journal of Statistics, Vol. I, Part 1 (June 
1933), pp. 109-122. 
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APPENDIX O 


Upper O.XO and 0.02 Limits of fit 
When Computed from Random 
Samples from a N^ormal Population 


This table shows 
tho black area: 


N 

0.10 

0.02 

60 

.285 

.619 

75 

.198 

.424 

100 

.152 

.321 

125 

.123 

.258 

150 

.103 

.216 

175 

.089 

185 

200 1 

078 

.162 

250 j 

.063 

130 

300 1 

053 i 

.108 

350 ; 

.045 1 

i 

093 

400 1 

.040 1 

081 

450 : 

035 ! 

072 

600 

.032 1 

.065 

650 

.029 

059 

600 

.027 

.054 

' 650 

.025 

050 

700 

.023 

.040 

750 

i .021 ! 

1 . 043 

800 

020 i 

.041 

860 

. 019 

038 

900 

.018 

036 

950 

.017 

.034 

1000 

.016 

.032 

1200 

013 

.027 

1400 

.012 

.023 

1600 

.010 

.020 

1800 

.009 

.018 

2000 

.008 

.016 

2500 

.006 

.013 

3000 

.005 

.011 

3500 

.005 

.009 

4000 

.004 

,008 

4500 

.004 

.007 

6000 

.003 

.006 


Taken, by permission, from a table siven by Eicon S. Pearson in his 
article “ A Further Development of Testa of Normality.” Btomelrika, 
Vol. XXII, pages 239 fl. A similar table for Vf! « «iven in E. S. 
Pearson and H. O. Hartley Biometrxka TabUa for Sfafisficians. 
Volume I, Cambridge University Pre«»« Cambridge. 1964, p. 183, 
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APPENDIX P 


Upper and Lower 0.05 and 0.01 
Limits of iSa When Computed from 
Random Samples from a Normal 
Population 


This table shows 
the black areas; 


AT 

Lower limits | 

1 

Jpper 

limits 

O.Ol 1 

1 0.05 1 

0.05 1 

0.01 

100 

2 

18 i 

2 

35 

3. 

77 

4. 

39 

125 

2 

24 1 

s'! 

40 

1 ^ 

70 

4 

24 

150 

2 

29 ! 

2 

45 

! 3 

65 

4 

14 

175 

2 

33 * 

2 

48 

3 

61 

1 4 

05 

200 ! 

2 

.37 

2 

51 

3 

. 57 

3 

98 

250 

2 

42 

2 

.56 

3 

52 

3 

87 

300 

2 

46 

2 

.59 

3 

.47 ' 

3 

79 

350 

2 

50 

1 2 

62 

1 3 

44 1 

3 

72 

400 

2 

.52 

i 2 

64 

1 3 

41 

3 

67 

450 

2 

55 

i 2, 

.66 

3 

.39 

3 

.63 

600 

2 

. 57 

! 2, 

.67 

3 

.37 

3 

60 

650 

2 

58 


69 

3 

.35 

3 

57 

GOO 

2 

GO 

1 2, 

.70 

3 

34 j 

3 

54 

650 

2 

G1 

i 2 

71 

3 

33 

3 

52 

700 

2 

G2 

i 2. 

72 ! 

1 

1 3. 
1 

.31 

3 

50 

750 

2 

1 

64 

1 

2 

73 

3 

30 

3 

48 

800 

2 

G5 i 

i 2 

74 

3 

29 

3 

46 

850 

2 

, 66 

1 2 

74 

3 

28 

3 

15 

900 

2 

66 

2 

75 

3 

28 

3 

43 

960 

! 2 

67 

2. 

76 

3. 

27 

3 

42 

1000 

i 2 

68 

2 

76 

3 

26 

3 

41 

1200 

! 2. 

.71 

2 

78 

3 

24 

3 

37 

1400 1 

1 2, 

72 

2 

80 

3 

.22 

3 

34 

1600 

2 

74 

2 

81 

3 

21 

3 

32 

1800 

2 

76 

2 

82 

3 

20 

3 

30 

2000 

2 

,77 

2 

83 

3 

18 

3 , 

28 

2500 

2 


2 

85 

3 

.16 

3 

25 

3000 

2 

81 

2 

86 j 

3 

16 

3 

22 

3500 

2 

.82 

2 

87 1 

3 

.14 

3 

.21 

4000 

2 

.83 

2 

88 

j 

3 

13 

3 

.19 

45U0 

2 

.84 

2 

.88 

3 

.12 

8 

.18 

6000 

2 

.85 

2 

J^_ 

3 

. 12 

3 

. 17 


Takon, by permiyaioa, from a table given by Egon S. Pearson in hie article “A Further 
Development of Teats of Normality,” Biometrxka, Vol. XXII, pages 239 ff. A similar 
table is given in E. S. Pearson and H. O. Hartley. Biometrika Tables for StaltMtictans, 
Volume 1, Cambridge University Fredas. Cambridge, 1954, p. 184. 
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APPENDIX Q 

Squares, Square Roots, and 
Reciprocals, 1—1,000 


tE9 

Square 

Square Root 

Reciprocal 


No. 

Square 

Square Root 

Reciprocal 

1 

1 

1.0000000 

1.000000000 


51 

26 01 

7-14142S4 

.010007843 

2 

4 

1.4142136 

0.500000000 


52 

27 04 

7.2111020 

010230769 

3 

0 

1.7320-508 

.333333333 


53 

28 09 

7.2801099 

.018867925 

4 

16 

2.0000000 

.250000000 


54 

29 16 

7.3484092 

.018518519 

5 

25 

2.23C0G80 

.200000000 


55 

30 25 

7.4161985 

.018181818 

6 

36 

2*4494897 

.166666607 


56 

3136 

7.4833148 

.017857143 

7 

49 

2.C4S7SI3 

.142857143 


57 

32 49 

7.5498344 

.017543860 

8 

64 

2,8284271 

. 125000000 


58 

33 64 

7.6157731 

.017241379 

9 

81 

3.0000000 

aiiiiiiii 


50 

34 81 

7.6811457 

.016949153 

10 

100 

3.1C22777 

.100000000 


60 

36 00 

7.7459667 

.016666667 

11 

1 21 

3.316624S 

.090909091 


61 

37 21 

7.8102197 

,0U»393443 

12 

1 44 

3.4641016 

.083333333 


62 

3S 44 

7.8740079 

.016129032 

13 

169 

3.6055613 

.076023077 


63 

39 69 

7.9372539 

.015873016 

14 

196 

3.7416574 

.071428571 


64 

40 96 

8.0000000 

.01.562.5000 

15 

2 25 

3.8729S33 

.066606667 


65 

42 25 

8.0622577 

015384615 

16 

2 56 

4.0000000 

.062500000 


66 

43 56 

8.1240384 

.015151515 

17 

2 89 

4.1231C56 

.058823529 


67 

44 89 

8.1853528 

.01492.5373 


3 24 

4.2426407 

.055555550 


63 

4G24 

8.24G2113 

.011705882 

IE 

3 61 

4.3588989 

.052631570 


69 

47 61 

8.300C230 

.014402751 

KTi 

400 

4.4721300 

.050000000 


70 

49 00 

3-3G6G003 

.0142S5714 

1^ 

4 41 

4. 5820767 

.047019048 


71 

5041 

8.426I40S 

.014081507 

22 

4 84 

4. 6904 I 58 

.045454545 


72 

5181 

8.4852814 

.013888889 

23 

5 29 

4.7958m 

.043178261 


73 

53 20 

8.5440037 

.013608630 

24 

5 70 

4.8939795 

.041000667 


74 

54 76 

8.G023253 

.013513514 

25 

€25 

S.OOOOtlOO 

.040000000 


75 

56 25 

8.6602540 

.013333333 

26 

6 76 

5,0990195 

.038461538 


76 

57 7C 

8.7177079 

.013i:)7895 

•J7 

7 29 

5.1961524 

.p37037037 


77 

59 29 

8.7749G44 

.012987013 

28 

7 34 

5,2915026 

.035714286 


78 

60 S4 

8.8317609 

.012820513 

29 

841 

5.3851648 

.034182750 


70 

62 4 1 

8.8SS1914. 

.012658228 

30 

9 00 

5.4772256 

.033333333 


80 

64 00 

8,9142719 

.012500000 

31 

9 61 

5,5677044 

.03225806v5 


SI 

65 Cl 

9.0000000 

.012315679 

32 

10 24 

5.656S512 

.031250000 


82 

67 24 

9.0553851 

.012105122 

33 

10 89 

5.7445620 

.030303030 


S3 

68 SO 

9.1101336 

.01204vS193 

34 

1156 

5,8309519 

.02941176,5 


84 1 

70 56 

9 16515141 

.011901702 

35 

12 25 

5.9160798 

.028571129 


85 1 

72 25 

9.219511SI 

.011761706 

36 

12 90 

6.0000000 

.027777778 


861 

73 06 

9,2736185 

.011627907 

37 

13 60 

6.0827625 

.027027027 


871 

75 60 

0.3273791 

.011401253 

38 

14 44 

6.1644140 

.026315789 


88' 

77 44 

9.380S315 

.011303636 

39 

15 21 

6.2449080 

.025641026 


, 89 

70 21 

9-4339S11 

.011235955 

40 

1600 

6.3245553 

.025000000 


90 

8100 

9.4S68330 

.011111111 

41 

16 81 

6.4031242 

.024390244 


91 1 

82 81 

9.5393920 

.0109S9011 

42 

17 64 

6.4807407 

.023809524 


92 

S4 04| 

9-5910630 

.010869565 

43 

18 49 

6.55743S5 

.023255814 


93 i 

80 49 i 

9.C43050S 

010752688 

44 

19 36 

6.6332490 

,022727273 


94 

88 36 1 

9.69535971 

.0106382981 

45 

20 25 

6.7082039 

,022222222 


95 

90 25| 

9 .7407943 j 

010526316 1 

46 

21 16 

6.7823300 

.021739130 


9G 

92 16! 

9.7970590 

.010416667 

47 

22 09 

6.8656546 

,021276596 


97 

94 09 

9.8488578, 

.010309278 

48 

23 01 

6.9282032 

.020833333 


98 

96 04, 

9 8994949 

.010204082 

40 

24 01 

7.0000000 

.020408163 


99 

98 01 

9-9498744 

.010101010 

60 

26 00 

7.0710678 

,020000000 


lUO 

100 00 

10.0000000 

.010000000 
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APPENDIX Q 


Ko. 

Square 

Square Root 

Recigreeal 

101 

10201 

10.0498756 

9000990 

102 

10404 

10.0995049 

9803922 

103 

10609 

10.1488916 

970S738 

104 

108 to 

10.1080390 

9615385 

105 

1 10 25 

10.2469.508 

9523810 

100 

1 1236 

10.2950301 

9433062 

107 

1 14 49 

10.3440S04 

9345704 

108 

1 1664 

10.392304B 

9259259 

109 

118 81 

10,4403005 

9174312 

no 

12100 

10.4880885 

9000909 

111 

123 21 

10.5356533 

9009009 

112 

125 44 

10.5830052 

8928571 

113 

127 60 

10.6301453 

8849558 

114 

1 29 96 

10.6770783 

8771030 

115 

132 25 

10.7238063 

8G95G52 

116 

134 56 

10.7703296 

8620090 

117 

136 so 

10.81GG538 

8547009 

118 

ISO 24 

10.8C27S05 

8474576 

119* 

14161 

10.9087121 

84033G1 

120 

144 00 

10.9544512 

8333333 

121 

14641 

11.0000000 

8264463 

122 

1 4884 

11,0453010 

8106721 

m 

. 16120 

11.0905305 

81300,81 

121 

J6S70 

11.1355287 

8001516 

125 1 

15625 

11.1803309 

8000000 

126 

1 58 76 

n. 2249722 

7936508 

127 

IQl 29 

11.2694277 

78740J6 

128 

1 63 84 

11.8137085 

7812500 

120 

166 41 

11.8578167 

7751J3H 

130 

16900 

11.4017513 

7G9230S 

131 

17161 

11.4455231 

763358S 

132 

174 21 

I 1.4891253 

7575758 

133 

17089 

11.6325620 

7518797 

134 

1 79 56 

11.6758360 

74G2GS7 

135 

182 25 

11.61 89500 

7407407 

136 

1 84 06 

U.CG1903S 

7352911 

137 

187 C9 

11.7046090 

7299270 

13S 

100 44 

U. 7473401 

7246377 

139 

103 21 

11.7808261 

7194215 

140 

106 00 

11.8321596 

7142857 

141 

1 93 81 

11.8743422 

7092199 

143 

2 0164 

11.9163753 

7012254 

143 

20449 

11.9582607 

6093007 

144 

2 07 36 

12.0000000 

6914444 

145 

2 10 25 

12.0415946 

CS9C552 

146 

213 16 

12.0830460 

6S40315 

147 

2 16 09 

12.1243557 

6S02721 

148 

210 04 

12. 1655251 

6750757 

149 

2 22 01 

12.2065556 

6711409 

160 

2 25 00 

12.2474487 

6666667 


No. 

Square 

Square Root 

Reciprocal 

.00 

151 

2 2801 

12.2882057 

6622517 

152 

23104 

12.3288280 

6578947 

153 

23409 

12.3693160 

6535943 

154 

23716 

12.4096736 

6493506 

155 

2 4025 

12.4498996 

6451613 

156 

.24336 

'24649 

12.4399960 

6*110256 

157 

12.5200641 

6369427 

158 

2 49 64 

I2.5G98051 

6320114 

159 

25281 

12.6095202 

6289308 

160 

2 56 00 

12.6491106 

6250000 

161 

2 50 21 

12.6885775 

6211180 

102 

2 6244 

12.7279221 

C172S40 

163 

26569 

12,7671453 

6134969 

164 

2 08 96 

12.8062485 

6097561 

165 

272 25 

12.8452326 

6060606 

166 

27556 

12.S840937 

6024096 

167 

2 78 89 

12.9228480 

59SS024 

16S 

282 24 

12.9014814 

5052381 

169 

28561 

13.0000000 

5917160 

170 

2 89 00 

13.0384048 

5SS2353 

171 

29241 

13.0760968 

5S47953 

172 

29584 

13.1148770 

5813953 

173 

209 29 

13.1529404 

57S0347 

174 

3 02 70 

13.1909060 

5747126 

175 

80025 

13.2287566 

57I42S6 

176 

809 76 

13.2061992 

5081818 

177 

81329 

13.3011347 

5G40718 

17S 

31084 

13..3416641 

5G17978 

J70 

8 20 « 

13.3790882 

55S6592 

ISO 

32400 

13.4164079 

5555556 

181 

3 27 61 

13.4536240 

5524862 

182 

33121 

13.4007376 

5494505 

183 

33489 

13.5277193 

54G14SI 

184 

333 56 

13.5616600 

54347S3 

185 

3 4225 

13.6014705 

5405405 

iS6 

84506 

13.6381817 

5376314 

187 

34909 

1.3.6747943 

5347594 

18S 

3 53 44 

13.711.3092 

5319149 

IS9 

357 21 

13.7477271 

5201005 

190 

36100 

13.78404,88 

52G3158 

191 

364 81 

13.82027,50 

5235602 

102 

86304 

13.8561065 

5208333 

103 

87249 

13.8924440 

5181347 

194 

370 36 

13.92S38S3 

5154639 

195 

3 SO 25 

13.0612400 

512S205 

196 

8 8416 

14.0000000 

5102041 

197 

3 88 09 

14.0356688 

5076142 

103 

3 92 01 

14.0712473 

5050j05 

199 

3 90 01 

14.1067360 

5025126 

200 

400 00 

14.1421356 

5000000 
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No. 

Squirt 

Squirt Rcot 

Reciprnoil 

.00 

201 

404 01 

14.1774469 

4075124 

202 

408 04 

14.2126704 

4950495 

203 

41209 

14.2478068 

4926103 

204 

41616 

14.2828569 

4901061 

205 

4 20 25 

14.3178211 

4878049 

206 

4 24 36 

14.3527001 

4864369 

207 

4 23 49 

14.3874946 

4830918 

203 

432 64 

14.4222051 

4807692 

209 

4 36 81 

14.4568323 

4784689 

210 

44100 

14.4913767 

4761905 

211 

4 45 21 

14.5258300 

4739336 

212 

449 44 

14. 5602198 

4716981 

213 

4 53 69 

14.5945195 

4694836 

214 

4 67 96 

14.6287388 

4672S97 

215 

4 62 25 

U. 6628783 

4651163 

216 

4 66 56 

14.6969385 

4629630 

217 

4 70 8S 

14.7309199 

4608295 

218 

4 75 24 

14.7648231 

4587156 

219 

479 61 

14.7986486 

4566210 

220 

4 84 00 

14.8323970 

4545455 

221 

48841 

14,8660687 

4524887 

222 

4 92 84 

14.8996644 

4504505 

223 

497 29 

14.9331845 

4484305 

224 

50176 

14.9666295 

4464286 

223 

506 25* 

15.0000000 

4444444 

226 

6 10 76 

15.0332964 

4424779 

227 

5 15 29 

15.0665192 

4405286 

228 

5 19 84 

15.0996689 

4386965 

229 

524 41 

16.1327460 

4366812 

230 

529 00 

15.1657509 

4347826 

231 

533 61 

15.1986842 

4329004 

232 1 

538 24 

15.2315462 

4310345 

233 

642 89 

15.2643375 

4291845 

234. 

547 56 

15,2970585 

4273504 

235 

5 52 25 1 

15.3297097 

4255319 

23G 

556 90 j 

15.3622915 _ 

4237288 

237 

561 60 

15.3948043 

4219409 

238 

5 66 44 

15,4272486 

4201681 

239 

6 71 21 

15.4596248 | 

4184100 

240 

67600 

15.4919334 

4166667 

241 

5 80 81 

15.5241747 i 

4149378 

242 

5 85 64 

15.5563492 

4132231 

243 

5 90 49 

15.5884573 

4115226 

244 

6 9536 

15.6204994 

4098361 

245 

60025 

15.6524758 

4081633 

246 

6 0516 

16.6843871 

4065041 

247 

61009 

15.7162336 

4048583 

248 

61504 

15.7480157 

4032258 

249 

62001 

15.7797338 

4016064 

250 

1 62500 

15.8113883 

4000000 


No. 

Squirt 

Squirt Root 

Rttigrocil 

251 

630 01 

15.8429795 

3084064 

252 

6 35 04 

15.8745079 

30GS254 

253 

6 40 09 

15.9050737 

3052569 

254 

6 4516 

15.0373775 

3937008 

255 

650 25 

15l 9687194 

3021569 

256 

65530 

16.0000000 

3906250 

257 

66040 

16.0312105 

3891051 

258 

6 65 64 

16.0623784 

3875909 

259 

6 70 81 

16.0934760 

3861004 

260 

6 7600 

16.1245155 

3846154 

261 

6 81 21 

16.1554944 

3831418 

262 

GS6 44 

16.1864141 

3810794 

263 

6 91 69 

16.2172747 

3802281 

264 

6 96 96 

16.2480768 

3787879 

265 

7 02 2.'? 

10.2788200 

3773585 

266 

707 66 

16,3095064 

3759393 

267 

712 80 

16.3401340 

3745318 

268 

718 24 

16.3707055 

3731343 

260 

723 61 

16.4012195 

3/17472 

270 

7 29 00 

16.4316767 

3703704 

271 

73441 

16.4620776 

3690037 

272 

739 84 

16.4924225 

3676471 

273 

7 45 29 

16.5227110 

3G33004 

274 

7 50 7G 

16.5629454 

3649635 

275 

756 25 

16.5831240 

3636364 

276 

7 6176 

16.6132477 

3623 188 

277 

7 67 29 

16.643.H70 

3610108 

273 

772 84 

16.6733320 

3597122 

279 

7 78 41 

16.7032931 

3584229 

28C 

7 84 00 

16.7332005 

3571429 

281 

7 89 61 

10.7630546 

3558719 

282 ; 

7 9524 

16.7928558 

3546099 

283 

800 89 

16.8220038 

35335G9 

284 

8 06 56 

16.8522995 

3521127 

285 

812 25 

16.8819430 

3508772 

236 

8 17 96 

16.911.5.345 

349r>503 

287 

8 23 69 ' 

16.9410743 

3484321 

288 

8 29 44 

16.9705627 

3472222 

289 

83521 

17.0000000 

3460208 

290 

84100 

17.0293864 

3448276 

291 

8 46 81 i 

17.0587221 

3436426 

292 

8 52 64 

17.0880075 

3424658 

293 

8 58 49 

17,1172428 

3412909 

294 

8 G4 30 

17.1464282 

3401361 

295 

87025 

17.1755640 ; 

333D831 

296 

876 16 

17.2046505 

3378378 

297 

8 82 09 

17.2330879 

3367003 

298 

888 04 

17.2026765 

3355705 

299 

8 94 01 

17.2916165 

3344482 

300 

900 00 

17.3205081 

3333333 
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No. 

Stiu&re 

Square Root 

Reciprocal 

.00 


No. 

Square 

Square Root 

Reciprocal 

.00 

301 

9 06 01 

17.3493516 

3322250 


351 

12 32 01 

18.7340910 

281Q(X)3 

302 

9 1 2 04 

17.S7S1472 

3.311258 


352 

12 39 01 

18.701CG30 

2S 10909 

303 

9 18 09 

17.4008952 

3300330 


3-53 

12 40 09 

1S.78S2012 

2S.32S61 

304 

9 24 16 

17.4355958 

3289-174 


3.51 

12 53 16 

18.8148877 

2821859 

305 

0 30 25 

17.4612192 

327S6S9 


355 

12 60 25 

18.84141.37 

2816901 

300 

9 3(5 36 

17.4928557 

3267974 


356 

12 67 36 

IS. 8079023 

2808989 

307 

9 42 19 

17.5214155 

3257320 


357 

12 74 49 

18.89414.36 

2801120 

308 

9 48 G4 

17. 5199288 

3240753 


358 

12 81 64 

18.920S879 

2793296 

309 

9 54 81 

17.6783958 

3236216 


350 

12 8S81 

IS. 9472953 

2785515 

310 

96100 

17.6068109 

3225806 


360 

12 96 00 

18.0736660 

2777778 

311 

9 67 21 

17.6351921 

3215431 


36 1 

13 03 21 

19.0000000 

27700S3 

312 

9 73 44 

17.C635217 

3205128 


302 

13 1044 

19.0202976 

2762131 

313 

9 79 60 

17.691S060 

3104888 


363 

1.317 69 

19.0525580 

275-1821 

314 

9 85 96 

17.7200451 

3184713 


361 

13 24 06 

19.0787840 

2747253 

315 

9 92 25 

17.7482393 

3174GU3 


305 

13 32 25 

19.1049732 

2739726 

310 

, 9 98 56 

17.7763888 

.3161557 


3G6 

13 .39 56 

19.1311265 

2732240 

317, 

10 01 89 

17.8011938 

3154574 


367 

1.3 40 89 

19.1572111 

2724796 

318 

10 a 24 

17.8325515 

314-1654 


36H 

13 512* 

19.1S332(;i 

2717301 

319 

101761 

17.860.5711 

3134796 


369 

13 61 61 

19.2093727 

2710027 

320 

10 24 00 

17.SS85438 

3125000 


370 

13 69 00 

19.2353841 

2702703 

32 ' 

10 30 41 

17.9101729 

3115265 


371 

33 70 41 

19,2613003 

2605418 

322 

^10 36 84 

17.9143584 

3103.590 


372 

13 S3 84 

19.287.3018 

2C8S172 

323 

10 43 20 

17.9722U0S 

3005975 


373 

33 91 29 

19,3132079 

2680065 

324 

10 49 70 

IS. 0000090 

3080420 


374 

13 9876 

19.3390796 

2073707 

325 

10 56 25 

18.0277564 

3076923 


3/5 

14 06 25 

19.36^19167 

2C0C667 

3*20 

10 62 70 

18.055-4701 

3067485 


376 

1413 76 

10.3907101 

2650574 

327 

10 69 20 

' 18.0S3M13 

3058104 


.377 

14 21 29 

19.4104878 

2652520 

32B 

10 75 84 

18.1107703 

3018780 


378 

14 28 81 

19.4-122221 

2615.503 

329 

10S2 41 

18.1383571 

3039514 


379 

14 36 41 

19.4070223 

2638522 

330 

10 89 UU 

18.1059021 

3030303 


3S0 

144100 

19.4935S87 

2631579 

331 

i 10 95 61 

1 18.1934051 

3021148 


381 

M 51 Cl 

19.5192213 

2624672 

332 

11 02 24 

18.220SG72 

3012018 


382 

14 50 24 

19.5148203 

2«)I7S01 

333 

11 US 89 

18.2182870 

t>(.*0c*003 

1 

i 

383 

11 60 89 

19.5703858 

2010000 

334 

11 15 56 

18.2750669 

: 2991012 


384 

14 74 56 

19.5050179 

2601167 

335 

11 22 25 

IS. 3030052 

: 20S5O75 


oS5 

14 82 25 

19.6211169 

2.507403 

330 

11 2S90 

1S.3303U28 

1 297(J1 90 


3S6 

1189 96 

1J.01G8827 

2.50067 4 

337 

11 35 00 

18.357550?i 

! 2967359 


387 

14 97 GO 

19.6723156 

258.3079 

338 

11 42 14 

1 1S.3S.177G3 

295.s:i^0 


3S8 

15 05 41 

19.6977156 

2577320 

339 

11 49 21 

18.4119526 

29-19853 


389 

j 15 13 21 

, 19.7230829 

2570604 

3^0 

11 56 00 

18.4300889 

2941176 


390 

15 21 00 

19.7184177 

2564103 

341 

11 62 SI 

18.46(ilS53 

2932551 


391 

35 28 81 

19.7737199 

! 2557515 

342 

11 09 01 

1 18.4932420 

2923977 


392 

15 36 64 

19.7989899 

2551020 

343 

1176 49 

18.5202502 

2015152 


393 

15 44 49 

i 19.8212276 

, 2511529 

3U 

1 1 83 3() 

18.5472370 

2906977 


391 

15 52 36 

79.8494332 

2538071 

315 

1190 25 

18.5741756 

2898551 


395 

15 00 25 

19.8740069 

2531616 

346 

1197 16 

18.6010752 

2890173 


306 

15 68 16 

19.S9974S7 

2525253 

347 

12 04 09 

lS.(i279360 

2ssisa 


397 

15 76 09 

19.9248588 

2518802 

348 

12 a 04 

18.65475SI 

2S735G3 


398 

15 8404 

19.9499373 

2512563 

349 

12 18 01 

18 6815117 

2805330 


399 

15 92 01 

19.9749844 

2.5062G6 

350 

12 25 00 

IS 7082SG9 

2857143 

1 400 ^ 

16 00 00 

i 20.0000000 

1 2500000 




70 


APPENDIX Q 


No. 

S(2uaro 

SquATO Root 

ReHrrocal 

.00 

401 

10 OS 01 

20 0249S11 

249370G 

402 

1610 01 

20.0490377 

24S75G2 

403 

10 21 09 

20.0748500 

24S1390 

404 

10 32 16 

20.0997512 

M7r.24S 

405 

IG 40 25 

20.1240US 

2469136 

4UC 

IG 48 30 

20.1494417 

24G3054 

407 

10 50 40 

20.174*2410 

2457002 

40S 

10 04 1)4 

20 1990009 

24,500.80 

400 

10 72 v81 

20.29374S-i 

241408.S 

410 

10 S! 00 

‘-’0 91Rir.07 

213^^02 4 

411 

10 >0 21 

20.2731,319 

2i:]30'H) 

412 

H'l 07 i \ 

20 L’OTTnoI 

:::i27ist 

413 

17 05 00 

20.3221014 

2121308 

414 

17 13 00 

20 3103899 

24155.59 

415 

17 22 25 

2'\ 3715 188 

21'J9().>9 

410 

17 30 50 

20.39007SI 

21038 01 

417 

17 38 80 

20.42(0770 

2308082 

418 

17 47 21 

20 . 'I'lSOlRR 

230JJ44 

410 

17 55 01 

20.4004S05 

23S0fi.'5j 

420 

17 01 00 

20 4939015 

23800.52 

421 

17 72 41 

20 51S2S15 

2375297 

422 

17 SO 81 

20 61263S6 

23G90C8 

423 

17 SO 20 

20. 50G(m:3S 

23G iOGO 

421 

17 07 70 

20 5912G03 

2358491 

425 

1S(,JG25 ' 

20 Cl 55281 

2351:941 

420 

Irt 14 70 

20.0397074 

23 1743 S 

427 

18 23 20 

20 0030783 i 

2311920 

428 

18 31 SI 

29,0SS1C09 

23.35159 

420 

IS 10 11 

20 712.1152 

23.31002 

430 

18 40 no 

20. 736 1411 

2325581 

431 

IS 57 Cl 

20. 700539.5 

2320180 

432 

IS GO 24 

20,7810097 

2314H1.5 

433 

74 89 

20.89M/.520 

2300409 

431 

1 - S3 50 

20. 8320.007 

2 : 9)1147 

•i ‘f 

!S'j2 25 

20. S5t 5.536 

2298S51 

ioD ; 

I ' t 1 '1 J OOr 

20 . ssooi:u) 

22;)5578 

137 

]U no ,;.j 

20.'''n 171,50 

! 22.ss.3::o 


10 I- ] 1 

20.9‘x'S)i95 

! 22S:-:i05 

430 

10 2 : 21 

20-‘'._>2.;2dS 

1 2277 9i;l 

410 

10 30 00 

20. 07G1770 

2272727 

441 

10 44 SI 

2l,0(Wi00 

1 2207574 

442 

10 53 iA 

21 0237900 

220244:1 

44"' 

10 G2 10 

21 047.50.52 

22,57330 

444 

19 71 .30 

21.0713075 

2 ^62252 

415 

10, SO 25 

21.09502:31 

2247191 

4!a 

10 NO 10 

21.1187121 

22 421.52 

147 

10 08 no 

21. 142.3746 

‘22.37130 

44 S 

20 07 04 

21.1000105 

1 22:^2143 

410 

20 3C 01 

21. ismvjoi 

2227171 

450 

1 20 25 (;< > 

21.2132034 

‘2222222 


No 

Sqiiare 

Squaro Root 

Reriprocal 

.UO 

451 

20 34 01 

21.23G700G 

2217295 

452 

2U‘i:!01 

21 2902010 

•2'212:iS0 

45:1 

20 62 09 

21.2S370C7 

2207500 

451 

20 G1 IG 

21 ,30727Gi> 

22026,43 

155 

20 70 25 

21.3.307290 

2107802 

4.50) 

2U 79 36 

21.3541505 

2192982 

1.57 

20 SS 49 

21 277.':.S9 

2)SS18-t 

45S 

20 97 04 

21 .4009.340 

21 83K't6 

459 

21 OG SI 

21.42]2iG;j 

217S0-19 

4 GO 

21 16 00 

21.4170106 

217.3913 

4iU 

21 25 21 

21.47{)0Ui0 

'’1601 67 

402 

21 3144 

21.4911853 

2101502 

40:1 

21 43 69 

21 .5171,318 

21,5fi>2V 

46 i 

2i 52 in; 

21 5406:92 

21' ,172 

405 

21 02 2.5 

2l.5G.3S5.s7 

21.J(,5,3^ 

lt)0> 

21 71 v> 

21 . CGTOGol 

2 • 15923 

41 *7 

21 SOSO 

21 .r>inis2S 

21 11328 

408 

21 90 24 

21 .l‘>.j.5.3o77 

21.30752 

4t)9 

21 99 Gl 

21 -G5n lOT.s 

21.32196 

470 

22 09 00 

21,G79JS.U 

21276)60 

471 

22 18 11 

21. 702.53 M 

•J12:'.142 

472 

*22 27 84 

21. 72 55610 

211^8614 

473 

2 > 37 29 

*:i 7185632 

■211 1165 

471 

22 46 7C 

21 7715 HI 

21097(;5 

47,5 

22 56 2.5 

21 79M047 

210.5263 

476 

22 d,5 70 

21 8171242 

2 100, MO 

477 

22 75 20 1 

21.8103207 

209G43G 

47S 

22 MSI 

2^^^;321U 

2092050 

479 

22 91 11 

21.886,0080 

2uSa).S3 

4ol) 

2.3 01 00 

21. 0089023 

20.8333,3 

481 

2:5 VMn 

21 .9317122 

207!^')! 12 

4^2 

23 23 2 1 

21 0511684 

■207 16.S9 

4 S .3 

23 32 S9 

‘21.9772()1(} 

20i 1 ) »93 

4.S4 

2.3 42 .56 ! 

22 0000000 

20661 16 

4S5 

2,3 .52 25 1 

22.022715,5 

206,1856 

4 So 

23 01 9(> 

22 0151077 

20,5/1)13 

4.^7 

23 71 00 

22 , OG.S0765 

2( '5.3.3 

•1S8 1 

2 :; SI 41 

22 09(»7220 

2049 ISO 

4.V.) 

23 91 21 

22,113:1414 

2on'.!'jo 

490 

21 01 00 

22. 13.591.30 ! 

2010810 

•191 

21 10 SI 

22. 158510.8 

‘20366,60 

492 

2120 04 

22 1^^107.30 

20.32.520 

493 

213040 

2? ‘20;){’0.3.3 

2n:>.8,398 

494 

24 40 :io 

22 ‘226,1108 


495 

24 50 25 

22 ‘2185055 

2020202 

490 

24 GO 10 

22 2710575 

2010129' 

49 ; 

21 70 00 


2012072 

498 

2180 01 

22 MoGVId 

200^032 

499 

24 90 01 

22 3.3S:3)79 

20 OK 1 OH 

500 

2.5 00 00 

22 ,3(.Oi.;9S 

2O00(/00 
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Kd. 

Gauftre 

Square Hoot 

Reciprocal 

.00 


No. 

Sqtiaru 

Square Root 

Keriprocftl 

.00 

601 

25 10 01 

22 3830293 

1096008 


551 

30 .30 01 

23.4733.S02 

1&14S82 

602 

25 20 04 

22. 10535C5 

1992032 


552 

30 47 01 

2.3 4946302 

1311594 

603 

25 30 09 

22.4270615 

19.HS072 


5.53 

30 58 09 

23 5139520 

1808313 

604 

25 40 10 

22.4499413 

1984127 


5.54 

30 09 16 

23.5372040 

180,5054 

605 

25 50 25 

22.4722051 

1980198 


555 

30 1*0 .27 

23 53^1380 

1S01802 

60G 

25 CO 36 

22.4944438 

19702.85 


550 

30 91 Jo 

23 5796322 

1V9H5()1 

607 

25 70 40 

22,6160005 

1972387 


557 

31 02 40 

23 00U8174 

170.5332 

608 

25 80 04 

22.6383563 

rj(xS504 


55S 

31 13 64 

23.0220230 

1799115 

609 

25 90 81 

22.5610283 

1004037 


559 

31 24 hi 

23 (a.iJS0S 

17SS009 

510 

26 01 00 

22.5831796 

1960781 


500 

31 30 00 

23 GO 13 101 

178:5714 

511 

2611 21 

22.6053091 

1950017 


501 

ra 17 21 

23.0S543^’6 

17S2:)3l 

612 

26 21 44 

22,6274170 

1053125 


502 

31 58 44 

2.J 700,7392 

1779359 

613 

26 31 09 

22.C496033 

1940.316 


503 

31 69 09 

23 7270210 

1 77 0 1 99 

614 

26 41 96 

22 071.6081 

191.7525 


561 

31 80 09 

23 7486812 

1773050 

615 

20 52 25 

22.C93ij114 

1941748 


565 

31 92 2t 

21 70972SC 

1709912 

616 

26 02 56 

22.7160334 

1937981 


5fi() 

32 03 56 

2J 7907515 

176>6T81 

617. 

26 72 S9 

22 7.376310 

1931230 


567 

:v> 14 89 

23 8117618 

1 7 dSoOS 

518 

20 83 24 

22.7.696134 

1930502 


508 

32 20 2't 

23 .8327.3UD 

1700-563 

619 

26 93 61 

22.7815715 

102G782 


5C9 

32 37 01 

23 853726.0 

17,5710') 

520 

27 04 00 

22.8035085 

192.3077 


570 

32 49 00 

23 ,S7.R,72M 

1751386 

621 

^'7 T4 41 

22,82542-14 

1919386 


57 1 

32 1,0 41 

23.895d0(;3 

! 7.51. '13 

522 

•27 24 81 

22.8473103 

1915709 



32 71 81 

23 9103215 

I7t8J52 

523 

27 35 29 

22.8001933 

1912016 


r)73> 

32 S,> 29 

23 9374181 

17)V0l 

521 

27 45 76 

22.S9104(i3 

1008397 


57 1 

32 9170 

23 9382971 

1712100 

525 

27 5() 25 

22 012S7S5 

1901 702 


575 

33 00 25 

23.97*0 570 


52(3 

27 lit) 7«j 

22 03i(;s09 

1001141 


57() 

33 17 70 

21 Od'p 

17301 1 1 ! 

527 

[ 27 77 29 

22,0501800 

1897533 


577 

33 29 29 

21 029'<243 

1733102 : 

528 

I 27 87 81 

22,9782500 

1803939 


57vS 

33 40 84 

24 OH 6366 

i73f lOi 

529 

27 98 41 

23.0000000 

1890.359 


579 

33 52 41 1 

24 0(/:il8S 

1727116 

530 

28 09 00 

23.02172S9 | 

lcSS0792 


5SO j 

33 0100 i 

21 OS31S!)l 

1721138 

531 

28 19 01 

23.0134372 | 

1883239 


581 

33 75 61 

24,10.39110 

1721170 

532 

28 30 24 

23.0051252 ! 

JSTOOOO 


5S2 

33 87 24 

21.1216r(,2 

171^213 

533 

28 10 SO 

23.0867928 

1870173 


583 

33 98 89 

24.14.>3929 i 

171,5206 

534 

2S 51 50 

23.10S4100 

1872059 


.584 ^ 

31 10 50 

21 ir.oo’ao 

17!.:3'29 

535 

28 02 25 

23.1300070 

ISf 19159 



31 22 25 

'.V 180:7.32 

1709102 

5;iG 

2S 72 96 

23.1510738 

1805»)72 


5s6 

3 1 33 96 

. .2074309 

iroHiiSo 

537 

i 28 83 69 

23.1732005 

1862197 


587 

3 1 4.3 69 

24.22^0820 

1703578 

538 

28 01 44 

23-104S270 

1. *<58736 


r.ss 

3 1 .57 1 ^ 

24 2)87113 

I7li3i.8<) 

539 

1 29 05 21 

23.2103735 

1855288 


589 

34 09 21 

24 . 2693222 

1097:93 

540 

29 16 00 

23.2379001 

1851852 


590 

34 81 00 

21.2>991.5C 

lO'.i 191,5 

511 

29 26 81 

23 25910()7 

1S48429 


591 

34 92 81 

21.3101910 

ir)92ia7 

542 

29 37 04 

23.2S0S935 

1845018 


592 

3501 GI 

21.3310501 

loyoyo 

513 

29 4S 49 

23.3023001 

1S4162I 


.593 

35 16 49 

24,3515913 

J(.>0311 

541 

29 59 36 

23 323^070 

1S.3S235 j 


591 

35 28 30 

21.3721 K52 

n ^3.302 

545 

29 70 25 

23.3452351 

1831862 


595 

35 40 25 

21 392021$ 

1 GSOoT 3 

546 

29 81 16 

23.3066429 

1 S3 1502 


596 

35 52 16 

24 4131112 

lt')77852 

547 

29 92 09 

23-38803U 

1S28154 

! 

597 

35 C i 09 

21.4335.8.M 

i 1)75912 

548 

30 03 04 

23.4093998 

IS24S18 


598 

35 70 04 

24.4540385 

107221! 

549 

30 14 01 

23.4307490 

1821494 


590 

35 S.S 01 

24.47a765 

l*ir>9M9 

650 

30 25 00 

23.45207SS 

1S381S2 

1 i 

30 00 OO 

24 4915974 

1606007 


m 
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APPENDIX Q 


No. 

6(iu&re 

SqUOTO Root 

Reciprocal 

.00 

601 

G02 

603 

36 12 01 
30 2101 
36 36 09 

24.6153013 

24.6350883 

24.5500583 

166.3894 

1661130 

1058375 

GOl 

600 

COG 

36 IS 16 
36 GO 25 
36 72 36 

24.5764115 

24.606747S 

21.6170673 

1055629 

1G52S93 

1650165 

607 

COS 

609 

36 84 19 

36 96 64 

37 08 81 

24.6373700 

24.0576560 

24.0770254 

1G47446 

1644737 

1642U36 

CIO 

611 

612 

37 21 00 
37 33 21 
37 45 44 

21.0981781 

21.7184142 

24.7386338 

1030314 

16300G1 

1633987 

613 

611 

615 

37 57 69 
37 09 96 
37 82 25 

24,75SS3GS 

21.7790234 

24.7991935 

1031321 

10280G1 

1026016 

616 

617 

61S 

37 94 56 

38 00 80 
38 19 21 

24.8193473 
24 839 IS 17 
21-8596Q5S 

1023377 
16207 16 
1G18123 

619 

620 
621 

38 31 61 
38 4100 
38 56 41 

24 8797106 
24.8997992 
21*9193716 

ICI 5.509 
1612003 
16103OG 

622 

623 

621 

3S68 81 
38 M 29 
38 93 76 

24.9399278 
24.9599070 
24 9799920 

1607717 

1005130 

1002561 

620 

620 

627 

39 06 25 
30 IS 76 
39 31 20 

25 0000000 
25 0190920 
25 0399081 

1600000 
15074 1 1 
1594896 

62S 

620 

G30 

39 43 SI 
39 56 41 
39 09 60 

25.0599282 

25.07UvS721 

25.0998008 

15923.57 

15S9825 

1587302 

631 

G32 

633 

30 81 61 

39 9121 

40 06 89 

25 1197134 
25.1396102 
25-1594013 

1584786 

15S227S 

1579779 

031 

035 

03G 

40 10 56 
40 32 25 
40 44 96 

25. 17935C6 
25-1092063 
25.2190404 

15772<v 

1574S03 

1572.327 

637 

63^ 

039 

40 57 GO 
40 71)44 
40 S3 21 

25 238S5S0 
25 2580619 
25 27S4493 

1500859 

150,7398 

1561945 

610 

611 

612 

40 96 00 

41 OH SI 
412161 

25 29S2213 
2.5 3170778 
25.3377189 

l.VV2r.OO 

1560002 

1557632 

043 

641 

615 

41 31 40 
41 47 36 
41 06 25 

25.3574417 
25-3771551 
25 . 390S502 

15.5.5210 

3552795 

15503S8 

6103 

C17 

648 

41 73 1C 
4i 86 09 
41 90 01 

25.4165301 
25 4301017 
25 4058141 

35179SS 

3515595 

1543210 

649 

650 

4212 01 
42 25 00 

25-4754784 
25 4950976 

1540832 

1538462 


No. 

S*iu9re 

SQUare Root 

Ilpciprocol 

Oi) 

G51 

42 38 01 

25.5147010 

1536098 

652 

42 51 04 

25.6312907 

1033742 

053 


25.5538947 

16.jl394 

654 

42 77 16 

25 5734237 

1520052 

6.55 

42 90 2:> 

25.5929078 

1526718 

65G 

43 03 30. 

25.0124969 

J 53 1390 

657 

43 IG 49 

25.6320112 

1.522070 

658 

43 20 61 

25 6515107 

1519757 

059 

43 42 SI 

25 6709953 

1517451 

660 

43 56 00 

25.09040.52 

ir.I.5l52 

601 

4:? 09 21 

25.7')99203 

!519S.')9 

662 

42 S2 44 

25.7203007 

1510.574 

0G3 

•13 95 09 

25 7487861 

150S296 

6«) 1 

41 OS 96 

25 76SI97.5 

150('.!)21 

005 

44 22 25 

25. 7875939 

15937.30 

606 

44 35 50 

25 8909758 

1501502 

667 

U48S9 

25 8203)31 

1 1 992.-10 

608 

4162 24 

25 8450900 

MUi'llUG 

6G9 

4176 01 

25 86.30313 

119170S 

070 

4180 00 

2.5 SH13.",S2 

1492:.')7 

671 

45 OJ a 

25.9030677 

1 190313 

072 

45 15 84 

2) 923902S 

M'’S095 

673 

4.5 29 29 

25.9122135 

).1'<.5SS1 

671 

45 12 76 

2.5 901.51UO 

1 48.9080 

675 

45 50 25 

25 0807021 

11SM8I 

67 1> 

45 69 76 

20 0000001) 

i.179990 

67/ ! 

4.5 s:J 29 

20.01922.37 

1-17711)5 

f»7s 

4.5 90 81 

26 038.1331 

1474999 

679 

46 10 41 

26 057('284 

1 179754 

OSO 

46 2160 

20-076S090 

14705S8 

r,si 

46 37 01 

20 09.:, 9767 1 

1403129 

0s2 

46 51 21 

20.1151297 

1.100970 

6S3 

40 61 89 

20. 13420S7 

1101129 

C84 

46 7S 50 

20.1533937 

1.101933 

(iS.5 

40 92 2.5 

20.1725017 

1-1593.5.1 

OSO 

47 05 9‘) 

20 1910017 

1457720 

0S7 

47 19 69 

20.2100848 

1-45,5001 

68^; 

47 33 44 

2t> 2297541 

147)3433 

(,S9 

47 47 21 

26.24SS095 

1451379 

690 

47 61 00 

20.2078511 

14)9275 

691 

47 71 81 

20 2Sr>s7S9 

1417178 

092 

47 8S 64 

20 3058920 

1415087 

093 

■18 02 19 

20.3218^32 

1.W3001 

r,94 

4s 10 30 

20 313S797 

1410992 

695 

48 i'-O 25 

26 3028527 

113SS49 

696 

4.8 44 hi 

26 38 18119 

1-1.10732 

(>97 

-JS .iS 09 

20. 1007.576 

1434720 

698 

48 72 04 

26,4196896 

11321)05 

699 

48 SO 01 

20.4386081 

1430015 

700 

49 00 00 

20 457.5131 

1428571 
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No. 

Squara 

Squaro Root 

Konprcvral 

CO 

701 

49 14 01 

2(i. 4704011) 

M 20534 

702 

49 28 01 

20. 4!).')2S20 

1424501 

703 

40 40 00 

20,5111472 

1422475 

701 

49 50 10 

20 5320'\83 

14201.".") 

705 

49 70 25 

2il ;>318301 

1418410 

700 

40 84 00 

2r).57l)r.805 

11104.31 

707 

49 93 49 

20 3S91716 

1414427 

70S 

50 10 ol 

20 r,()s„>(;‘)l 

1412429 

709 

50 20 81 

26.02705.39 

H 10437 

710 

50 41 00 

26 045S252 

1405451 

711 

50 55 21 

20 0fU583,3 


712 

50 69 41 

20) 0Si.i2sl 

MOilOi 

713 

50 S3 00 

26 7i)2il598 

1402.52.5 

711 

50 07 90 

20 72077 SI 

1 100560 

715 

51 12 25 

2G.7J:)1s;-;|( 

laosooi 

710 

51 20 cA 

26 75':1703 

1396(;1S 

717 

51 40 89 

2t» 7 <'6)8557 


/ 1 S« 

51 55 21 

2() 7!'.')522(J 


719 

51 09 Gl 

26 814175)4 

)300S21 

720 

51 8100 

20 8328 157 

138.S.',S‘) 

721 

51 98 41 

20 «14132 

138t»G3 


.4' J2^1 

2(). 8700577 

1385042 

723 

*52 27 *20 

26 sSsC)40:5 

J3,<!I'20 

721 

52 41 70 

26.0072481 

1381215 

725 

52 50 25) 

26.9238210 

137'.t3I0 

725 

52 70 70 

2(3 911.1872 

137711(1 

727 

52 85 20 

20 9G29375 

13 , ,»i)IG 

72S 

52 99 84 

20. 9S 147.31 

1373020 

729 

53 14 U 

27 0()0('000 

1371742 

730 

53 29 00 

27.0185122 1 

1 '.03.S03 

731 

53 43 01 

27.0370117 

13r)70?<) 

732 

53 58 21 

27.055198.5 

13GC120 

733 

53 72 SO 

27.0730727 

1301230 

734 

53 87 50. 

27,0924314 

1.3G230S 

735 

51 02 25 

27.110^8.34 

MGOdll 

73G 

51 16 96 

27 1293199 

133S&1)G 

737 

54 31 r.o 

27 1477439 

1.3.36S,'',2 

73S 

5140 14 

27.1061554 

13.3501 1 

739 

51 Gl 21 

27.1845514 

1353180 

710 

51 70 00 

27 2029410 

1.351351 

711 

54 90 81 

27.221:: 1,82 

1310523 

742 

55 0.3 04 

27.2390760 

1347709 

743 

5o 20 49 

27/2.3S0263 

1315895 

744 

55 3.3 30 

27 27G3G34 

13110sr) 

745 

55 50 25 

27.2946XS1 

13422S2 

710 

55 65 IG 

27 .3130006 

13-10 1S3 

747 

55 80 09 

27. ;33 1.3007 

1.338i')SS 

74S 

55 95 04 

27:3495887 

1336898 

740 

56 10 01 

27.3678044 

1335113 

750 

56 25 00 

27.3801279 

1333333 


No. 

Bquftro 

SqiiirA Root 

Renprcvcal 

.00 

7.51 

5G 40 01 

27.4013702 

1.331.5.58 

752 

56 55 04 

27 4220184 

i;i207S7 

753 

56 70 09 

27.4408155 

1328021 

754 

.50 8,5 IG 

27 4,590001 

1320260 


.57 00 25 

27 4772633 

1321o03 

756 

.57 1.5 30 

27.4951542 

1322751 

7.57 

57 30 49 

27.5130,330 

1321004 

7.5S 

57 45 64 

27 5317998 

1319261 

759 

57 60 SI 

27.549951G 

131752.3 

760 

57 76 00 

27.508097.) 


761 

5/ 91 21 

27 5S022St 

J3U000 

762 

5S 06 44 

27,6043475 

1312336 

76:^ 

.5.8 21 GO 

27.G22454G 

1310010 

76 4 

GS 30 'JG 

27 0105409 

130S90I 

705 

58 52 25 

27.G5803.34 

1.307190 

760 

58 or 50 

27 0707050 

1303183 

7i»7 

5S 82 89 

27.0917048 

13037H1 

703 

58 98 24 

27 7128129 

13020S3 

769 

59 13 61 

27 7.303.492 

1300390 

7/0 

59 29 00 

27 7488739 

1298701 

771 

59 4 1 41 

27 7008868 

1297017 

772 

59 59 84 

27 7.8438,80 

129.5337 

7<'’.l 

.5'/ 7.5 2') 

27.3023775 

1 203661 

771 

50 90 70 

27.8208555 

1291990 

775 

60 06 2.5 

27 83.38218 

1290323 

770 

60 21 76 

27..3.5GT7GI) 

I2X.S(i60 

777 

60 37 29 

27.8717197 

12S7001 

773 

60 .52 84 

27..S92G5M 

12S5347 

779 

CO 03 41 

27.910.5715 

12S3G07 

780 

60 8100 j 

27.9231801 

12S2051 

781 

CO 90 Gl 

27.9103772 

12S0410 

782 

61 15 24 

27. 9G 12029 

127S772 

783 

Cl 30 S9 

27.9821372 

1277139 

734 

61 40 56 

28.0000000 

1275510 

7S5 

Cl 62 2,5 

23 0178515 

1273S,X5 

7S6 

Gl 77 96 

28.0350915 

1272205 

7S7 ! 

61 93 69 

28 0535203 

1270648 

73S 

62 09 44 

28 0713,377 

1209036 

789 

02 25 2: 

28.0891138 

1 267427 

790 

62 41 00 

2.3,1009380 

12G5S23 

791 

02 50 SI 

28.1217222 

1264223 

792 

62 72 64 

28.1424940 

1 262026 

793 

62 88 49 

28 16025.57 

1261034 

794 

63 04 3(*> 

28 1780050 

1259-146 

795 

03 20 25 

28.1957441 

1257S62 

796 

63 36 IG 

2S. 213 4720 

1250281 

797 

63 ,52 00 

28.2311884 

1254705 

798 

63 as 04 

28.2488938 

1253i:i3‘ 

799 

63 84 01 

28 2605SS1 

125156'4 

SOO 

64 00 00 

28.2842712 

1250000 
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7n 


No 

8()UA7« 

8<juaro Root 

Rt nprocaJ 
.Ou 


No. 

SQuaro 

&Quar« Root 

Recij^ocoJ 

SOI 

64 10 01 

28:3019134 

1248430 


851 

72 4201 

29 1719043 

1175088 

802 

64 32 04 

2S.3t'Jl)Ul5 

12108S3 


S52 

72 59 04 

20 1890390 

1173709 

fi0.i 

64 48 09 

28.3372516 

1245330 


853 

72 76 09 

29.2061G37 

1172333 

SOi 

6461 16 

2S..S.'-)1S03S 

1243781 


851 

72 93 10 

29.2232784 

1170000 

805 

64 80 25 

28.3725210 

1242236 


855 

73 1075 

iic 2.03830 

U69.591 

800 

CJ 96 3(5 

2S.390139I 

1210605 


856 

73 27 36 

29.2574777 

11C8224 

807 

051249 

2S,4n7r4';t 

1230157 


S57 

73 44 49 

20.2715G23 

11GG861 

80» 

G5 28 IH 

28,4253108 

1237624 


858 

73 6t 64 

29.2016370 

1105501 

809 

05 44 81 

28.4429253 

1*236091 


859 

73 78 81 

29.3087013 

1104144 

810 

05 61 00 

28.46049S9 

1234568 


860 

73 90 00 

20.32575G6 

1102791 

811 

03 7T 21 

4780617 

Visaoic 


KOI 

7113 21 

20 312S015 

1161440 

812 

65 93431 

28.4950137 

1231527 


862 

74 30 41 

29.3598305 

1160093 

813 

06 09 GO 

09.S13I9I9 

1230012 


Si'.3 

7J47C9 

20.37fi.S016 

1158749 

8U 

CG 25 % 

•>s 5306S52 

1228501 


SO 4 

716190 

20.393S709 

1157407 

815 

60 42 25 

28.5182018 

1221/994 


805 

74 82 25 

20.4108823 

1156009 

SIC 

Co 5S SC 

2S..‘)Im7137 

1225490 


800 

7}99:('. 

PO 427S779 

1154731 

817 

60 7 4 SO 

2t 5S32110 

1223'^ ^0 


8(w 

75 1C 89 

29 441S(J.37 

11,53103 

818 

00 9121 

2S. 0000993 

1222401 


808 

753121 

20.41)18307 

113’2074 

810 

07 07 Gl 

2S,ClS17fiO 

1221001 


860 

7.1 51 01 

29.47SSn’10 

115074S 

820 

07 24 00 

2S.(.rit5i2l 

1210512 


870 

73 00 00 

20 4057624 

1149425 

821 

C7 10 41 

2S.C530976 

1218027 


571 

7.) 56 41 

29 51*27091 

1148106 

822 

07 56 84 

28 6705124 

I21C5r> 


872 

76 03 81 

20 5*200161 

1110789 

S23 

07 73 29 

28 Ob'^OTO'i 

121500/ 


^7.^ 

7*> 21 *20 

2'>,.')ti.573t 

1145175 

82 i 

67 89 70 

2(8,7051002 

1213502 


874 

70 ob 76 

29.5O3101O 

1141105 

825 

CS 06 25 

2S.722S132 

1212121 


875 

7o 56 25 

20.5S03080 

1112857 i 

820 1 

08 22 76 

2S. 7 102157 

1210654 


570 

76 73 70 

‘29.5972972 

111155.) 

827 , 

Cb oO 29 

2b/r57(.077 

12UJ1V(9 


577 

70 91 29 

29.0U1553 

1110251 ! 

SLS 

lA 55 S4 

2S 771MC91 

1207720 


87S 

77 OS 84 

29.6310613 

113S952 

820 

Gs 72 U 

IN 

1201.273 


,s7') 

77 26 41 

20.0179*112 

lH7i)50 

830 , 

Cb bO 00 

1 2b 8'i072H5 

120 lbl9 


ijSjO 

77 44 00 

29.0017939 

n3G.)64 

wU ' 

60 05 01 ' 

* 2^‘ S27O7t'0 

120536) 


8Sl 

77 61 61 

20.6816442 

11.35074 

812 

22 21 ! 

i 2^ sniio.i 

1201021 


8-^2 

77 79 21 ! 

20.09S131S 

1I,)37)57 

bo 3 1 

09 3S S9 : 

I 28. b«) 17(901 

r2(»04^0 


b'_J 

77 %^^ 

29. 7 153 150 

1132503 

SSI ! 

09 55 50 

2S.S7905S2 

iiooon 


8S4 

78 1 1 56 

20.7.32!375 

1131222 

Sot 

<' ‘ 72 25 

28 8961()()0 

11971.05 


5s5 

7S32 25 

'2‘\74S0196 

1129914 

SGl) 

09 90 1 

2>'.'Ji3u6K) 

1190172 


556 

78 10 90 

20.7657521 

112SG()5 

837 

70 05 00 

2S. 0300523 

1101743 


8S7 

7^67 60 

20,7S25r)2 ! 

1127.396 

80S 

70^2 44 

28.04S2267 , 

n'»lU7 


5(SS' 

75 55 41 

29.799:12S0 

112G12G 

839 

70 39 21 

2S.005iOj7 

U9lS9o 


859 

79 0321 

29.5101U30 

U21S59 

SIO 

70 56 00 ' 

28.682753,5 ' 

1100170 


800 

70 21 00 

20.8328678 

1123596 

S!1 

70 72M 

29 1)0000(10 



891 

70 3S51 

20 5406231 

1122331 

812 

70 '-9 01 

29. 9172.(03 

ilS70l8 


592 

79 56 G 4 

29.5603090 

1121070 

843 

71 06 10 

20 0311623 

Uv'.2U)' 


SO) 

70 7140 

29.8811056 

1119821 

8ii 

71 23 36 

20 n'>'i,7M 

llSlTil ' 


SOI 

70 92 .36 

29,S09S.V28 

1113.568 

815 

71 10 25 

29.0)jS'- To'/ 

1183112 


505 

bO U) 25 

29.9I(»5jOG 

1117318 

810 

71 57 16 

29 OS(>07ol 

11^2033 


806 

80 *28 16 

20 9332.591 

1116071 

817 
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APPENDIX R 


CoHimon Logarithms of Numbers 


The common logarithm of a number (N in the table) is the power to 
which 10 must be raised to produce N, The adjective ‘^common indi- 
cates that a logarithm is to the base 10 rather than to some other base 
for example, e = 2.71828, the base of ^‘naturar* logarithms. When the 
unmodified term ‘logarithm*’ is used, it is generally understood that 
common logarithms are meant. A logarithm is composed of two parts, 
the characteristic and the mantissa. 

The characteristic, which is always an integer or zero, is determined 
by the following rule: 

If N ^ 1, the characteristic is positive and its vahn* is oiu* less Ilian tlie 
number of digits in *V which are to the left of the decimal point. For 
example, 

iV ChnrncUri.Htic 

4568 3 

456 8 2 

45 68 I 

4 5(i8 0 


If .Y < 1, the characteristic is negativ'o and its value is one more than the 
number of zeros just to the right of the decimal pt)int. For (‘xa.mple, 


y 

0 4508 
0 04508 
0 U04568 
0 ()0045fi8 


CharacUri'^lic 
-1 or 0 - U) 
-2 or 8 - JO 
-3 or 7 - 10 
-4 or 6 -- 10 


The mantissa, which is always a decima] or zero, is obtained from a 
table such as that which follows. The mantissa is the same for any givcm 
combination of digits no matter where the decimal point may he placed. 
Thus, for all of the eight just listed, the mantissa is 0.05972(i. 

Combining the characteristic and the mantissa gives the logarithm. 
For the eight values of N given above, 


N 

4568 
456.8 
45 . 08 
4.568 
0.4568 
0.04568 
0.004568 
0.0004568 


Logarithm 
3 659726 
2 659726 
1 659726 
0.659726 
9 659726 - 10 
8 659726 ^10 
7.659726 - 10 
6.059726 - 10 
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030600 

031004 

031408 

031812 
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033021 

404 
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033424 

033826 

4227 

4628 

5029 

5430 
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6230 
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8442 
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390 
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9218 

9606 

9993 

050380 
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051153 

051538 

051924 

052309 

052694 

336 

3 

05307B 

0534G3 

053846 

4230 

4613 

4995 

53/8 

5760 

6142 

6524 

383 

4 

6905 

7286 

7666 

8046 

8426 

8805 

9185 

9563 

9942 

060320 

379 

115 

060698 

061075 

06145? 

061829 

062206 

062582 

062958 

063333 

063709 

4083 

376 

6 

4458 

4832 

5206 

5580 

5953 

6326 

6699 

70/1 

7443 

7815 

373 

7 

8186 

8557 

8926 

9298 

9668 

070038 

070407 

070776 

071145 

071514 

370 

8 

071882 

072250 

072617 

072985 

073352 

3718 

40BS 

4451 

4816 

5182 

366 

9 

5547 

5912 

6276 

6640 

70C4 

7368 

7731 

8094 

8457 

8819 

363 

120* 

079181 

079543 

079904 

080266 

080626 

090987 

081^^7 

081707 

082067 

082426 

360 

1 

082785 

083144 

083503 

3861 

4219 

4576 

4'ii4 

5?91 

564/ 

6004 

357 

2 

6360 

6716 

7071 

7426 

7781 

8136 

8490 

8845 

9193 

9552 

355 

3 

9905 

090258 

090611 

090963 

091315 

091667 

092018 

092370 

092721 

0930/1 

352 

4 

093422 

3772 

4122 

4471 

4820 

5169 

5518 

5866 

6215 

6562 

349 

125 j 

6'j10 

7257 

7604 

7951 

8298 

8544 

3390 

9335 

9681 

100026 

346 

6^ 

100371 

100715 

101059 

101403 

101747 

107091 

102434 

102777 

103119 

3462 

343 

7 

3804 

4146 

4487 

4828 

5169 

5510 

5851 

6101 

6'^Jl 

6871 

341 

8 

7210 

7549 

7868 

8227 

8565 

8903 

9241 

9579 

991G 

110253 

33B 

9 

110590 

110926 

111263 

111599 

111934 

112270 

112605 

112940 

113275 

3609 

335 

130 

113943 

114277 

114611 

114944 

115278 

115611 

115943 

116276 

116608 

116940 

333 

1 

7271 

7603 

7934 

8265 

8595 1 

8926 

9236 

9536 

9915 

120245 

330 

2 

120574 

120903 

121231 

121560 

121888 1 

122216 

122544 

122871 

123198 

3525 

328 

3 

3852 

4178 

4504 

4030 

5156 1 

5481 

5806 

6131 

6456 

6781 

325 

4 

7105 

7429 

7753 

8076 

8399 

8722 

9045 

9363 

9690 

130012 

323 

135 

130334 

130655 

130977 

131298 

131619 

131939 

132260 

132580 

132900 

3219 

321 

6 

3539 

3858 

4177 

4496 

4314 

5133 

5451 

5769 

6066 

6403 

318 

7 

6721 

7037 

7354 

7671 

798/ 

8303 

8618 

8934 

9249 

9364 

i 316 

8 

9879 

140194 

140508 1 

140822 1 

141136 

141450 1 

141763 

142076 

142389 

142702 

314 

9 

143015 

3327 

3635 1 

3951 j 

4263 

4574 

4885 

5196 

5507 

5816 

311 

140 

146128 

146438 

146748 

147058 

147367 

147676 

147985 

148294 

148603 

148911 

309 

1 

9219 

9527 

9835 

150142 

150449 

150756 

151053 

151370 

151676 

151992 

307 

2 

152288 

152594 

152900 

3205 

3510 

3815 1 

4120 

4-124 

4728 

5032 i 

305 

3 1 

5336 

5640 

5943 

6246 

6549 

6852 i 

7154 

7457 

7759 1 

8061 

303 

4 

8362 

8564 

8965 j 

9266 

9567 

9868 

T60168 

100469 

160769 1 

161068 

301 

145 

161368 

161667 

161967 

162266 

162564 

162863 

3T6I 

3460 

3758 

4055 

299 

6 

4353 

4650 

4947 

5244 

5541 

5838 

6134 

6430 

6726 

7022 

297 

7 

7317 

7613 

7908 

8203 

8497 

8792 

S086 

9380 

9674 

9968 

295 

8 

170262 

170555 

170848 

171141 

171434 

171726 

172019 

172311 

172603 

172895 

293 

9 

3186 

3478 

3769 

4060 

4351 

j 4641 

4932 

5222 

5512 

5802 

291 

160 

176091 

176381 

1766/0 

176959 

177248 

' 177536 

177825 

178113 

178401 

178689 

289 

1 

8977 

9264 

9552 

9839 

180126 

180413 

160699 

180986 

181272 

181558 1 

287 

2 

181844 

182129 

182415 

182700 

2985 

3270 


3839 

4123 

4407 1 

285 

3 

4691 

4975 

5259 

5542 

5825 

6108 

6391 

6674 

6956 

7239 1 

283 

4 

7521 

7803 

8084 

8366 

8647 

8928 

9209 

9490 

9771 

190051 1 

281 

155 

190332 

190612 

19089?. 

191171 

191451 

191730 

192010 

192289 

192.567 

2846 I 

279 

6 

3125 

3403 

3681 

! 3959 

4237 

4514 

4792 

5069 

5346 

5623 ! 

278 

7 

5900 

6176 

6453 

1 6729 

7005 

7281 

75S6 

7832 

8107 

8382 i 

276 

8 

8657 

8932 

9206 

9481 

9755 

200029 

200303 

200577 

200850 

201124 1 

274 

9 

201397 

201670 

201943 

202216 

257488 

2761 

3033 

3305 

3577 1 

3648 

272 

N. 

0 

1 i 

1 

’ 2 

3 

I 

4 

5 

6 i 

7 


9 

D. 


Lor e = 0.43‘120r); log x = 0.497130; log \/x 0.2 ISnr.'i. 
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160 

204120 

204391 

204663 

204934 

205204 

205475 

205746 

206016 

206286 

206556 

271 

t 

6826 

7096 

7365 

7634 

7904 

8173 

8441 

8710 

8979 

9247 

269 

2 

9515 

9783 

210051 

210319 

210586 

210853 

211121 

211388 

211654 

211921 

267 

3 

212188 

2124S4 

2720 

2986 

3252 

3518 

3783 

4049 

4314 

4579 

266 

4 

4844 

5109 

5373 

'‘638 

5902 

6166 

6430 

6694 

6957 

7221 

264 

165 

7434 

7747 

8010 

8273 

8536 

8798 

9060 

9323 

9505 

9846 

262 

6 

220108 

220370 

220631 

220892 

221153 

221414 

221675 

221936 

222196 

222456 

26! 

7 

2716 

2976 

3236 

3496 

3755 

4015 

4274 

4533 

4792 

5051 

259 

8 

5309 

5568 

5826 

6084 

6342 

6600 

6858 

7115 

7372 

7630 

258 

9 

788? 

8144 

8400 

3657 

8913 

9170 

3426 

9682 

9938 

230193 

256 

170 

230449 

230704 

?30n60 

231215 

231470 

231724 

231979 

232234 

2324B8 

232742 

255 

1 

2996 

3 ’50 

3504 

3757 

4011 

4264 

4517 

4770 

5023 

5276 

253 

1 

5528 

5781 

6('13 

6285 

6537 

6789 

7041 

7292 

7544 

7795 

252 

3 

8048 

0:9/ 

054.1 

8799 

9049 

9299 

9550 

9800 

240050 

240.300 

250 

4 

240,'49 

240703 

241048 

241297 

241546 

2417S5 

242044 

242293 

2541 

2/90 

249 

17S 

3033 

3286 

3534 

3782 

4030 

4277 

4525 

4772 

5019 

5266 

248 

6 

5513 

5730 

6006 

6252 

6499 

6/45 

6991 

723/ 

7482 

7728 

246 

V 

79'?3 

6219 

8464 

8709 

8954 

O'OS 

9143 

9687 

9932 

2501/6 

245 

6 

2';0120 

250C51 

750^^08 

251151 

251305 

2516 18 

251381 

252125 

252368 

2610 

243 

9 

2553 

3036 

3538 

3580 

3822 

4001 

4306 

4548 

4790 

son 

242 

160 

255273 

255514 

25575ii 

25S996 

256237 

256477 

256710 

2Sr/9':8 

257193 

25/439 

241 

1 

7679 

73)3 

8158 

am 

8537 

837/ 

Oils 

3355 

9594 

9833 

233 

2 

2600 n 

260210 

760:48 

260/67 

281 r.s 

261203 

26150! 

261739 

261976 

2fi?211 

238 

3 

7451 

2633 

.'•VS 


33f9 

3036 

2373 

4109 

4346 

4587 

237 

4 

461S 

5054 


5d:5 

5/61 

5'j96 

6232 

64^; 

6/0?. 

6937 

23S 

165 

"172 

7405 

7641 

7875 

fillO 

8344 

8578 

S912 

0046 

0779 

2.34 

6 

9511 

9746 

9930 

270213 

27044b 

270/, 79 

avoou 

2/1144 

271 J77 

271609 

233 

7 

271 S42 

272074 

2723U6 

2338 

Z770 

3001 

37.33 

34f4 

3c9e 

392/ 

232 

8 

4158 

4380 

4f>:0 

4850 

5081 

5311 

554?. 

57/7 

6002 

b?3? 

230 

9 

«)462 

6692 

69.’1 

7151 

7330 

7609 

7338 

80v,7 

1 8295 

8S7.S 
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2295 

2342 

2388 

2434 

2481 

2527 

2573 

2619 

46 

9 

2666 

2712 

2758 

2804 

2851 

2897 

2943 

2989 

3035 

3082 

46 

If. 

0 

1 

2 


4 

6 

6 

7 

8 

9 

D. 
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N. 

0 

1 

2 

3 

4 

6 

6 

7 

8 

9 

D. 

940 

973128 

973174 

973220 

973266 

973313 

973359 

973405 

973451 

973497 

973543 

46 

1 

3590 

3636 

3682 

3728 

3774 

3820 

3866 

3913 

3959 

4005 

46 

2 

4031 

4097 

4143 

4189 

4235 

4281 

1327 

4374 

4420 

4466 

46 

3 

4512 

4558 

4604 

4650 

4695 

4742 

4788 

4834 

4880 

4926 

46 

4 

4972 

5018 

5064 

5110 

5156 

5202 

5248 

5294 

5340 

5386 

46 

945 

5432 

5478 

5524 

5570 

5616 

5662 

5707 

5753 

5799 

5845 

46 

6 

5891 

5937 

5983 

6029 

6075 

6121 

6167 

6212 

6258 

6304 

46 

7 

6350 

6396 

6442 

6488 

6533 

6579 

6625 

€671 

6717 

6763 

46 

8 

6808 

6854 

6900 

6946 

6992 

7037 

7083 

7129 

7175 

7220 

46 

9 

7266 

7312 

7358 

7403 

7449 

7495 

7541 

7586 

7632 

7678 

46 

960 

977724 

977769 

977815 

977861 

977906 

977952 

977998 

978043 

978069 

978135 

46 

1 

8181 

8226 

8272 

831/ 

8363 

8409 

8454 

8500 

8546 

8591 

46 

2 

8637 

8683 

8728 

8774 

8319 

8865 

8911 

8956 

9002 

9047 

46 

3 

9093 

9138 

9184 

9230 

9^75 

9321 

9366 

9412 

9457 

9503 

46 

4 

9548 

9594 

9639 

9685 

9730 

9776 

9821 

9867 

9912 

9958 

46 

965 

980003 

980049 

Q80094 

980140 

980185 

980231 

900276 

980322 

980367 

980412 

45 

6 

0458 

0503 

0549 

0594 

0640 

0685 

0730 

0776 

0821 

0867 

45 

7 

0912 

0957 

1003 

1048 

1093 

1139 

1184 

1229 

1275 

1320 

45 

8 

1366 

1411 

1456 

1501 

1547 

1592 

1637 

1683 

1728 

1773 

45 

9 

1819 

1864 

1909 

1954 

2000 

2045 

2090 

2135 

2181 

2226 

45 

900 

982271 

982316 

982362 

982407 

982452 

982497 

982543 

982588 

982633 

982678 

45 

1 

2723 

2769 

2814 

2859 

2904 

2949 

2994 

3040 

3085 

3130 

45 

2 

3175 

3220 

3265 

3310 

3356 

3401 

3446 

3491 

3536 

3581 

45 

3 

3626 

3671 

3716 

3762 

3807 

3852 

3897 

3942 

3987 

4032 

45 

« 

4077 

4122 

4167 

4212 

4257 

4302 

4347 

4392 

4437 

4482 

45 

965 

4527 

4572 

4617 

4662 

4707 

4752 

4797 

4842 

4087 

4932 

45 


4977 

5022 

5067 

5112 

5157 

5202 

5247 

5292 

5337 

5382 

45 

7 

5426 

5471 

5516 

5561 

5606 

5651 

5696 

5741 

5786 

5030 

45 

8 

5875 

5920 

5965 

6010 

6055 

6100 

6144 

6189 

6234 

6279 

45 

9 

6324 

6369 

6413 

6458 

6603 

6548 

6593 

6637 

6682 

6727 

45 

970 

986772 

986817 

986861 

986906 

986951 

986996 

987040 

987085 

987130 1 

987175 

4$ 

1 

7219 

7264 

7309 

7353 

7398 

7443 

7488 

7532 

7577 

7622 

45 

2 

7666 

7711 

7756 

7800 

7845 

7890 

7934 

7979 

8024 

8068 

45 

3 

8113 

8157 

8202 

8247 

8291 

8336 

8381 

8425 

8470 

8514 

45 

4 

8559 

8604 

8648 

8693 

8737 

8782 

8826 

8871 

8916 

8960 

45 

976 

9005 

9049 

9094 

9138 

9183 

9227 

9272 i 

9316 

936! 

9405 

45 

6 

9450 

9494 

9539 

9583 

9628 

9672 

9717 ' 

9761 

9806 

9850 

44 

7 

9395 

9939 

9983 

990078 

990072 

990117 

990161 

990206 

990250 

990294 

44 

a 

990339 

990383 

990428 

0472 

0516 

0561 

0605 

0650 

i 0694 

0738 

44 

9 

0783 

0827 

0871 

0916 

0960 

1004 

1049 

1093 

1 1137 

1182 

44 

980 

991226 

991270 

991315 

991359 

991403 

991448 

991492 

991536 

! 991580 

991625 

44 

1 

1669 

1713 

1758 

1802 

1846 

1890 

1935 

1979 

2023 

2067 

44 

2 

2111 

2156 

2200 

2244 

2288 

2333 

2377 

2421 

2465 

2509 

44 

3 

2554 

2598 

2642 

2686 

2730 

2774 

2819 

2863 

2307 

2951 

44 

4 

2995 

3039 

3083 

3127 

3172 

3216 

3260 

3304 

3348 

3392 

44 

085 

3436 

3480 

3524 

3568 

3613 

3657 

3701 

3745 

3789 

3833 

44 

6 

3877 

3921 

3965 

4009 

4053 

4097 

4141 

4185 

4229 

4273 

44 

7 

4317 

4361 

4405 

4449 

4493 

4537 

4581 

4625 

4669 

4713 

44 

8 

4757 

4801 ! 

4845 

4889 

4933 

4977 

5021 

5065 

5108 

5152 

44 

9 

5196 

5240 

5284 

5328 

5372 

5416 

5460 

5504 

5S47 

559! 

44 

990 

995635 

995679 

995723 

995767 

995811 

995854 

995898 

995942 

995986 

996030 

44 

1 

6074 

0117 

6161 

6205 

6249 

6293 

6337 

6380 

6424 

6468 

44 

2 

6512 

6555 

6599 

6643 

6607 

6731 

6774 

6818 

6862 

6906 

44 

• 3 

6949 

6993 

7037 

7080 

7124 

7168 

7212 

7255 

7299 

7343 

44 

4 

7386 

7430 

7474 

7517 

7561 

7605 

7648 

7692 

7736 

7779 

44 

995 

7823 

7867 

7910 

7954 

7998 

8041 

8085 

8129 

8172 

6216 

44 

6 

8259 

8303 

8347 

8390 

8434 

8477 

8521 

8564 

8608 

8652 

44 

7 

6695 

8739 

8782 

8826 

8869 

8913 

8956 

9000 

9043 

9087 

44 

8 

9131 

9174 

9218 

9261 

9305 

9348 

9392 

9435 

9479 

9522 

44 

9 

9565 

9609 

9652 

9696 

939 

9783 

9826 

9870 

9913 

9957 

43 

N. 


1 

2 

3 

4 

1 ^ 

6 

7 

8 

9 

D. 



APPENDIX S 
Demoiistrations 


Section 9.1 


To prove that Sj = 0. 


Let 

Then 


But 

Therefore, 


xi ^ Xi — X, = X2 — Xy 

Xx - ^(X - X) 

== - NX. 


Xn 




s = 


N ' 


2x = XX - = 0. 

N 


To prove that ^ + - 


Section 9.2 
Sji 
X ' 

^1+ + ■ ■ 

'at 


+ 


Adding and subtracting Xj, 

(Xi — Xd) + (-^2 — Ad) -j- • • • “b (A jv Ad) 




X = Xd + 

But, by definition, 

dt = Xi — Xd, di — X 

Then 

X = Xd -h 

2d 


Xd, • • • , djv 5= Xw — iXd. 

i 

di + d2 + • * • + djv I 


iV 


= Xd + 


iV 


If each item is weighted by its frequency, the expression p 

2/d 


X = Xd + 

792 


N' 
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Section 9.3 

To prove that X > G for a series of positive values not all the same. 
Xi and Xk are the smallest and largest values of the series. For these 
two values, 

(X, - X>,r > 0; 

X\ - 2XiX.v + > 0. 

Adding iXiXtf to both sides of the inequality gives 

XI + 2X,Xs + AJ > 4XrXs. 

Taking the square root, we have 

Xi + Xv > 2 VMw and 


Xi Xn 

H Iri.and Xy are each replaced by » the value of X for the 

A 

entire series is not changed. However, sxich a replacement increases 
the value of (7, since > \/XiXn and the contribution of 




to the geometric mean exceeds the original contribution of 


XiXi/. Continually repeating this process for the smallest and largest 
remaining values results in continually increasing the value of (?, which 
approaches and equals it following the last substitution, since the 
individual values are then all the same* 


Section 9.4 


To prove that G > H for a series of positive values not all the same. 
Xi and Xjf are the smallest and largest values of the series. In the 
preceding section, it was shown that 


Therefore, 


Xi + X^>2 VXiX^. 


VXiXv (Xi + Xs) > 2X,Xy and 

V...Xv > + 


But 


2X,Xi, 


Xi + X;. J_ , J_ 

XiX^ X, X;, 


X, -l- Xj, 


> which is H. 



m 


APPt^Nnix i 


> iXiXir 

If Xi and Xtf are each replaced by their harmonic mean, — — r— r;r» the 

Xi + X/r 

value of H for the entire series is unchanged. However, such a replace- 

2XiXh / 

ment decreases the value of G, since — — — ^ < V XiXit and the con- 

Xi + Xif 

) to the geometric mean would be less than the 

contribution of XiXu. Continually repeating this process for the small- 
est and largest remaining values results in continually decreasing the 
value of G, which approaches //, and equals it following the last substitu- 
tion, since the individual values are then all the same. 

Section 10.1 

To prove that 2d* is smallest when Xd = that is, that is a 
minimum. Where x = X — X, d = X — Xd, and Xd may be any 
designated value, which may or may not be X. Then 

2d* = Z(,X - XdY, 

= 2.y* - 2XdZX + NXl 

But J? * ^ and XX = NX, so 
N 

2d* = 2^-* - 2XdNX + NXI 
Adding and subtracting NX* gives *> 

2d* * XX* - NX* + (NX* - 2XdNX + NXD, 

» 2Jir’ - NX* + N{X* - 2XdX + X*d), 

■ - XX* - NX* + N{X - Xd)* 

If Xd is either larger or smaller than X, the third term, N{X — Xd)*, 
is positive, and therefore 2d* is smallest when Xd = X, in which case 
2d* * 2i*. 

Section 10.2 


To show that 
Since 


IXx* IXd* /2d\» 

y N N \n) 

x = X - X, 


-4 

-4 


X(X - X)* 
N 


X(X* - 2XX + X*) 

N 

XX* - 2XXX + NX* 

AT 
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But since 


N 

N 






XX* 

N 


- 2X> + X* 


I XX* 

y N 


X* 


I XX* /xx\* 
'~yN \N )' 

By detinition, d = X — Xj, or X = d + Xj. 
Therefore: 


-V 


-4 


X(d + X;)* 

rs(d + 

X.)]* 

N 1 

-V 

J 

fS(d» + 2dXi-j’ 

•^2) 

/Xd + NXi\* 

N 


\ N ) 

\Xd* + 2XaXd + NXl 

{Xd)* + 2NXiXd+N*Xl 

N 


N* 

'Xd* - ^ 

N + 2^“ AT 

+ Xl- 

m* - Sd 
at* ~ n ~ 

Cl * 

1 




For a frequency distribution. 


Or, with deviations in terms of class intervals, 


IXfx* . IX/id')* /2/d'\* 

V at W jv V N / ■ 


To prove that vj = 


Section 10.3 

2f(dr ^ 2/d' Xfid’)* ^ ^ 


N " N N 
• It was shown in Section 9.2 that 


m- 



AP^ENBIX S 


For any aelcoted X value, say Xi, xi ^ X\ — X ^ Xi — 


But Zi — Ad = di; therefore, *1 = Oi — —• 

iV 

Sd 2d 

Similarly, Xt = dt — — » a:j « dj — — » etc, 


Thus, 


Si.-3f S.- + 3(f)’2d_Ar(f)' 

N 

2d‘ ^'Sd'Sd^ . „/2dVSd /2dV 

}v ^iVAr'^^XAT/Ar Viv/' 


2d» „ 2d 2d* „ 
2d* „ 2d 2d* . „ 


2d* „ 2d 2d* . „ /2d\» 

iy ^ iV AT ■^^(at) 


/^Y _ /:^Y 

\jv/ \n) 

/^Y 

U/ 


For a frequency distribution this becomes 


2/x» _ 2/d* 

Af A^ 


x/dlfd- (Sfd\ 


or, in terms of class intervals cubed, 


2/(d')* „ 2/d' 2/(d')* „ 

»■ - -jr~ " * “T ~iv“ + * 


(f)' 


Section 12.1 | 

^ The Least-Squares Criterion | 

The following discussion assumes that the distribution ^f chance errors 
follows the normal curve, and that the best central value from which to 
measure such accidental deviations is therefore that value which makes it 
most probable that the (deviations are distributed normally. 
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Let a series of such deviations, or errors, and the interval within which 
they fall be designated by the following symbols: 

zi is an item falling at the mid-point of a very small interval, Azj; 

i( it tt a (t it it it it t( it it 


it it n a <t a << if ti it it a 

Now the probability that a deviation will fall within a certain interval h 

^ Area of frequency curve within boundaries of tha^ interval 
Area of entire frequency curve 

Thus the probability of obtaining an error Xi which falls within the inter 
val Axi is approximately the ratio of the area of a rectangle, with base ol 
Axi and height the ordinate at the mid-point of the interval, to the arei 
ot thev<5ntire frequency curve. 

If this curve is the normal curve, thi^ probability is 


- -7— e '^^'Ax 
<r V 2t 


i> 


since the expression for the ordinate of a normal curve as a ratio to th< 

. ** 

entire number of frequencies is Vc = 7=^- e * 

(T V27r 

The probability of obtaining errors Xi, X3, etc., falling within specificc 
intervals is similarly obtained. 

The probability that several independent events will occur is th( 
product of the individual probabilities of the separate events. Therefore 
the probability that the particula^r svi of errors will occur which we hav( 
assumed (that is, a normal distribution of errors) is as follows: 





X Aa:, X Ax 2 X • • • X Ax. . 
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Since any number raised to a negative power will be greatest when that 
exponent is least, P is greatest when + **+••• 4 is least. 
Therefore, the probability that accidental deviations from some central 
value will follow the normal curve is greatest when the sum of the squared 
deviations from that central value is at a minimum. 


Section 12.2 

Derivation of the Normal Equations for a Strath t Line 
Fitted by the Method of Least Squares 

If Yt is a trend, or computed, value, Y ia'a deviatioiTfrom trend. 
To satisfy the least-squares criterion, 2(F' — Ke)* must be a minimum. 
Since the straight-line equation type is 1% = a 4 bX, 

S(K - Y,y = S(K - (a 4 bX)^ = S(K - a- bX)\ 

Expanding, this expression becomes 

- 2a2K - 2bZXY 4 Na^ 4 2abXX 4 (1) 

If this expression is solved for a and b, we .shall obtain the two normal 
equations. Rewriting expression (1) according to descending powers of 
a gives 

iVa* 4 20(62^ - 2K) 4 - 2bXXY 4 b^I^XK 

This is a quadratic of the type pm* 4 gm 4 r, where p is N, m is o, q is 
2(6 2X - 2F), and r is 2r* - 2bXXY + 6*2A:*. If p is positive (as 
it must always be for statistical problems when p = AT), such a quadratic 


has a minimum value when m — 


Therefore, 


-2{bXX - 2r) 2K - bXX 

2N N ^ 

Rewriting (2) gives 

XY = Na 4 62X. . . . , the first normal equatiqn. 
Rearranging expression (1) according to descending powers of 6 gives 
6*2X* 4 26(a2X - 2XF) 4 SK* - 2a2K 4 ATa* (3) 

In this quadratic, p is 2X'*, m\sb,q is 2(a2X — 2.X’y), anti r is 2y* — 
2o2 K 4 ATo*. Since 2X* is positive, expression (3 ) will have a minimum 

value when m * — so 


-2(a 2A' - 2Xy) 
'22A’* 


2XK - o2X 
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Rewriting (4) gives • 

XXY = aXX + the second normal equation. 

Section 13.1 

Derivation of the Equations for Fitting Growth Curve of the 
Type Yc - k + ab^ 

Designating by n the number of years in each third of the data, the 
first equation (see Equation 1, p. 301) is: 

SiF = nfc + n + nft + ab^ + + • • • + 

== nk + a[l + b + b^ + + • • • + b^^-% 

If now.the expression inside the brackets be multiplied by ^ we have 

6—1 

[1 + 6 + 6 * + 6 * + • • • + 6<«-»](6 - 1 ) 

. 6-1 ■ 

6 + 6* + 6’ + • • • + + 6» - 1 - 6 - 6* - 6» - • ■ • - 

_ u (2) 

6" - 1 
6 - 1 ' 

The fourth term shown in the numerator of expression (2) is 6'’’“‘>. This 
follows from the fact that the next-to-the-last term within the brackets 
of expression (1) may also be designated as 6‘’‘~*>; and X b — 6<"~‘’ 
All three equations are obtained in a similar fashion. They are: 


I. 2,T = nfc + 


II. SjF = nfc + o6 


III. SjF = nfc + a6*’ 




Equations A, B, and C now are: 


A, S,K - 2, K - « (“^) (I.. - 1) - „ 


B. 2,r - 2,K = o6" 


(6» - 1)» 
6 - 1 ■ 
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„ 2,F - S,K (6- - D* . (6» - ir 

(j. — =3 flO“ ^ fi — zm 

S,F - S.F 6-1 6-1 




Sir - Sir 


Therefore,;. - y ^,7 - X.Y 

Equation A gives us the formula for a: 

(6” - 1)* 


SjK - S,F = a 


6 - 1 


6 - 1 


o = (SjF - SiF) 

^ * 1 ^ (6» - 1)* 


From Equation I we find : 

s.r.nh + (‘^^)<.. 


k = ~\Z,Y 
n 


(^)4 


Section 19.1 

To prove that Fc — F. 

Fo = a + bX. 

2F. = S(a + 6.Y) 

= Na + bZX. 

But Na + 6S.Y = S?' (Normal equation I). 

Therefore, SF. - SF 

SFe 2F , 

j 

N N 

Fo = F 

To prove that 2F' = aSF + 62YF. 

2F* - 2(a + 6Y)= 

- 2(a^ + 2abX + b^X^) : 

= Na‘^ + 2a62Y +6=2Y* : 

= a{Na + 62X) + 6(a2Y + 62x4- 

But Na + 62Y = 2F (Normal equation I), and 
o2Y + 62Y* = 2YF (Normal equation TI). 

Therefore, 


•d) 


.( 2 ) 


21'^ = a2F + 62XF 


.(3) 
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To prove that Sy* = ST’ - fSK. 

By the procedure shown in footnote 3 of Chapter 21 for Sx* it may be 
shown that 

Sy* = ST* - FST. 

Similarly, it is true that Sy’ = ST’ — F.ST^. 

But Fe = F (Equation 2) and ST^ = SI' (F^cjuation 1). 

Therefore, SyJ = ST’ - FST (4) 

To prove that Sy^ = ST* — ST’. 

Sy.’ = S(T - Tc)* 

= ST* - 2STn + ST?. 

But T. = o + bX; hence, STT, = S[}'(a + bX)\ = S(aT + bXY) 

= aST + 6SXT. 

Now aST + bSXT ■= ST? (Equation 3). 

Tlwrpfore, Sy? = ST* - 2ST? + ST? 

• =ST*-ST? (5) 

To prove that Sy? — bXxy. 

Sy? = S(6x)* --= l;*Sx* = 6 Sx* = 6Sxy (6) 

To prove that Sy? = Sy* — Sy?. 



Sy? - SI'* - ST?. (Equation 5) 

. 

Urt 

ST* = Sy* 4 FST, and 

ST? = Sy? 4- FST. (Equation 4) 


Therefore, 

Sy? - (Sy* + FST) - (Sy? 4- FST) 



= Sy* - Sy? 

(7) 


Section 19.2 

0 * 

Derivation of Constants for Straight-Line Equation 
when Origin Is at X, Y 

The normal equations for fitting a straight line by the method of least 
squares are 

2r = ATa + hSX; 

SXr = a2.Y + 62X2. 
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If the origin be taken at ^,7 inatead of 0,0, we have 


= Na + bXx] 
i, 'Lxy = a2® + 6Sx®, 
But 2 j/ = 0 , and 2 x = 0 . 


Therefore, a = 0 , and 6 


2/®’ 


The estimating equation becomes yc = bx instead of K, = o + bX, 


To prove that 


22/® 


Section 19.3 

(2x2/)® 

2x®22/* 


Since yc — bx, we may write 


Vc ^ 2(bx)® ^ b®2x® 

22/® 2y® 2?;® 

2x2/ 

From the second normal equation, b = Therefore, 





{Sxyy 

2x®22/® 


Section 19.4 

^ Xxy NZXY - (2X)(2F) 

'o prove that 77 = 

NsxSr - (2X)*][Ar2F® - (2K)®1 


2x2/ = 2l(X - X)(7 - P)] = 2(A'K - XY - X? + X?), 
= HXY - XSY - 7ZX + NX? 

^ 'ZXY ^NX? - NX? -\- NX? 

= 2XF - NX?. i 


az 


/2X* /2X\® . 

■ViT - (ir) ' 


Sy 
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Therefore, 

Sary - NX? 

„ /SA» TsFy /SF* Ts^v 

iV \n) y N \ AT / 

Ar(SAF - iVJ?F) 


NsxSy 


ATSXF - (SX)(SF) 

•V[7VSA* -T^)"][^SF> - TSF)*] 

Section 19-5 

Given that Xi, X 2 , • • • , Xn can take values only of the integers 1 
through without duplication or omission, and that the same is true of 
Fi, Fi, • • • , 

rr. 

Paralleling the proof given in Section 24.4 for arithmetic means, it may 
be shown that 

So = Sjt + 4 — 2rsxar, 

where D — X — Y. From this relationship it follows that 


s| + s* - 


N 




2sxSr 

But = SF^ when we are dealing with ranks. Therefore, 

o 2 

9<,z_ 

SZ>2 


N 


r rank ““ 


= 1 - 


Now SX is the sum of the first N natural numbers, or 


N{N + 1) 


and ZX’ is the sum of the squares of the first N natural numbers, or 
/m±mL±ll. Therefore. 

at a ' 
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Nsl = 2(X - JP)* = SX* - XXX, 

„ + 1)(2X + 1) _ N -f 1 _ NjN + 1) 

6 2 ‘ 2 ’ 
2N(N + l)( 2 i\r + 1 ) - 3JV(iV + 1 )* 

12 ' 

^ N(jy» - 1) 

12 

Substituting in the expression for r, we have 

_ _ 02 /)* 

^ iv'(iV*’- !)■ 

6 


Section 20.1 

Th4 point of diminishing absolute returns is the highest point in th( 
total returns curve. At this point the slope is zero. The slope of a 
curve at any point may be found by taking the first derivative of th( 
equation. The first derivative of the equation Ye — a + bX + cX* + 
dX> is 

^ = b + 2cX + 3dX» 


Setting 


dX 


— 0, we have X 


-c ± Vc* - 36d 
3d 


For the total returns eciuation Yc = 890.32 + 78.264X + 20.324X* — 
4.4649X^ the above equation yields X = —1.337 and 4,371. When the 
slope is zero, we have a maximum or a minimum point. Only positive 
values of X arc of interest, and inspection of Chart 20.3 indicates that a 
maximum is reached when X is close to 4. Or, if the reader will compute 
Yc values in the neighborhood of X = —1.337 and X = 4.371, he will 
discover that the former is a minimum and the latter a n^aximum. When 
X = 4 371, the computed total returns Yc = 1,247.85. The point of 
diminishing total returns is reached when the input of nitrogen is 4.371 
per cent. At thi.s point the estimated yield is 1,247.85 pounds. 

The point of diminishing marginal returns is the point of inflection in 
the curve It is the point where the change in the slope is zero. The 
change in the slope is the second derivative of the estimating equation. 
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Thus, 




2c + 6dX. 


Setting = 0, we have X = 

OA* 


3d‘ 


For the total returns equation, the point of inflection is X = 1.517. 
Thus the point of diminishing marginal returns is reached when the input 
of nitrogen is 1.517 per cent. At this point the estimated yield is Ye = 
1,040.23 pounds. 

Section 21.1 

Proof that 

. ( rn - Turn ^ 

~ "sxT- Sx^.r' 

A dv<>ioo8tration for the other formulas of these types would proceed 
along sifliilar lines. 


If ru.s 


Vl 


^12““ ^13^23 

- - rn 


r 


2 _ 
12,3 “■ 


2r lafiaraa - f* fiafas 

T- ri - rl~+r\Az ‘ 


.( 1 ) 


But rfa = 
other r’s. 


(ZxiXa)* 
2x*2x[’ ’’’* 
Therefore: 


Sx iX2 

VZxWl 


and similar formulas obtain for the 


(2 i,X 2)» 

2x524 

2 j:i 22 2x1^3 SxzJs 

_ \/24 2x5 \/24 24 V 2x5 --^5 . 

+ 

r(2x.x3)» (2x:x,)»] 

V _2 V -2 ^ /I 

(2x,r,)* (2Tai3)» 

2x524 2424 

’(2x.xj)' 

2424 

X 

2424 


Multiplying numerator and denominator by Sx^2xJ(SxJ)^ this simpli- 
fies«to the following equation ; 


, (SjainSjiXa)* - 2^xl^xiXiXxiZ3XiiTi + (SxiXj)U 2xji,)^ 

2iJ2x5(2xJ)* - 2 x 52 i|( 2 iiij)* - 2 j’J2z5(2xixi)* + (2j:iX5)2(2.t2j3)» 


We know that rjj.3 = 


2x? - 2 x‘..3 


( 2 ) 

( 3 ) 


But SXei.t = bitSxiZ} 


2*1X3 „ _ (2x1*3)* 

2^3 2x,x, - 
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Also, ^3 = 6ij t^xixt + bu t^xiXi, 

Now, the normal equations for obtaining bit.s and bn 1 are: 


II. ZxiXt — bn bit 2^XiXt] 

III. 2 /XiXi — bit iZxjXi “i* bit.i!S:rt. 

In order to solve for bn.t, we may multiply Equation II by ZxtXt, and 
Equation III by Sx,, and subtract Equation II from Equation III, 
Thus, 

II. ZiiijSxsXt = bmZxsZxiXt + bi 8 j(SxtXj)* 

III. SxiXtSxl = bit iSxiSxiXa + bi 3 tS xtSxl 

2xiXsSx| — ZxiXjSxtXj = bi3 4Sx|Sx| — bissCSxiXs)* 

^ 2 xiXs 2 x| — 2 xiX 22 xjX 3 

In a similar fashion, we may solve for bn 3. This involves multiplying 
Equation II by 2 x| and Equation III by 2x4X3. By such a process we 
find that 

2xiXs2x4Xs — 2xiX22x3 
( 2 x 4 X,)» -'^^ 2 x^ 

Substituting these expressions for bn 4 and bn 3 in the equation for 2 x*i. 33 , 
we have 

, SxiXiSxjXj - SiiXjStJ „ SiiXjZxl - SjeiXjZxsX, 

--Fx|27r "sijFx* - (fx^x.jr^ 


This simplifies to 

^ , (2xiX3)*2x3 + (2xiX2)*2x* — 22xiX42xiXs2x4X3 

■ 2 x^ 2 x^ - ( 2 x 4 X,)* 

Now substituting our expressions for 2x’i 43 and 2x|i 3 in Formula (3), we 
have 


( 2 XiX 3 )* 2 xJ + ( 2 XiX 2 )^ 2 x| — 22X 1X42X1X32X4X3 ^XiXs)* 

2 2X32X3 — (2X4X3)^ 2xf 

‘ (2 m.)* 

Expanding and simplifying, this expression becomes Equation (2). 
Therefore, 

( ru - rit rtt _ Sx^ ,3 - 2x;i., 

Vl - r?» V 1 - r|,/ 2xJ - 2x*i,, 
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To prove that 
. . ^Nk = N. 


Section 24.1 
Xi 4“ + ■ ■ ' ~l~ ^K. 


= X(j., when Ni — Ni = 


' -aT I 


5*1 + + * • • + Al N 2 


K 


K 

~ NK 


XX K 


+ XXk 


N 


Each random sample of N items contains — of the population, and each 

. (P 

JV 

item will occur ■— K times. Therefore, 

<P 

N 

- KXX 

SX. + SXj + • • • + XX K (P 1 


NK 


NK 


where X indicates a summation over the items in the population. 
1 

N 

1 - 

L - yy 

NK (Pi 

= 


Section 24.2 


To prove that (t^ == when Ni ^ N2 *'* — N k 

The scheme of the random samples appears as follows: 


Hem 

Sample 1 

Sample S 

l:hniple 3 

a 

X.i 

X.z 

Xa8 

b 


Xm 

X,, 

c 

Xn 

X.. 

Xet 

N 

Xni 

x« 

Xni 


There are K samples. The individual items are replaced after each 
sample has been drawn. 
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We shall use 

A’ 

2 to indicate a summation over the K samples; 

1 

<p 

2 to indicate a summation over the items in the population; 

1 

2 to indi(*atc a summation over a sample over a particular sample if a 
subscript follows A’' ; thus, XX i is the sum of the A' values in sample I ; 
and 

X to mean A" — .Tjy, a usage of x employed only \i\ this proof. 

I'he deviations of the items from the population mean are Xai == A'^ai ~~ 
= Xh\ — A^', • • • , Xm = A VI — A(p. Xa 2 = A „2 A(J), etc. 

We can therefore write the various items as AV + x„i, X(y + ^6i, ‘ , 

+ J'M, AV + Xa 2 , etc. 

For Sample 1: 2A^ == A'Y(p + Xxx, 

For Sample 2: 2A% = XX^y + 2i-2» 

and so forth, * 

where 2 j*i 0, 2^2 0, etc,, .since x = A’' Xy. 

Adding a constant to (or snbtractit\g a constant from) a series of values 
does not alter the value of the standard d(*viation of those values, so that 

LA' ~ ^ Zx‘ 

For the K .samples, 


J 

K ’ 


since 

A 



w(2x; — 2.ri d' 2.r2 "b * 

] 

' + 2.r Av — 0, 

and 

A K 



AtTv^ — 2(2 j:)^ = X{Xa + “b .Tr + ' * + Xtf)-. 

I 1 

For ally one bampfe, 

{Xa + Xt + Tc -r • ‘ + x«,^ ^ xl+ XaXb + X„X, + * ’ * + TaXv 

"b XfiXft ”b "b XfjXc "b * * "b x^^Xn 

+ XaXc + XbXc + xj: + • • • + XcXn 

+ XaXN + X6.rv + XcXft + • • ■ + Xv, 
~ + 22x..i:/, 


2(2.rj 

j 

K J 



DEMONSTRATIONS 


869 


where x, represents any item and x^Xj represents the product resulting 
from each combination of two different items. Therefore, for the K 
samples, 

K 

Kal, = SrSjf + 2:^XiXj), 

1 

A' A 

= 2 ( 2 j ;) + 2'L(Zx,Tj). 


Each sample of X items t ontains ^ of the population, and each item 
N X 

will occur in of the samples, or — K times. If a given item (jr,) occurs 

N , N - \ 

in — of»the samples, a second item (x,) will occur in v of the samples 

(p- ' (P - 1 

jV -- 1 

in which the first item occuns, and both items will occur in — 

(P (P - 1 

N(N - 1) 

of the samples, or — ; K times. Thus, each x,Xj will occur 

^ (P((P - 1) 


N(N ~ 1 ) 


(?((P - 1) 
Therefore, 


K times. 


N ^ X(N - 1) ^ 


, ..JV'OV - I) J 

O' tv = w.r; + 2 — ; Zx,Tj. 

(5^ 1 ' (P((P - 1) 1 

By a development similar to that .shown above for for one sample, 

we have 


(p / ^ * \ 

<^XiXj = f V 


• (p (p (p 

But 2,r, = 0. 'I'horpforo, 22x,x^ = — 2j:f, and 
1 1 1 

, N , N{N - 1) I .. 

(tIy = ij. — 77“ 2x,-, 

(V 1 (P((P - 1) I 

.V . N(N - 1) , 

(P (P((P ~ 1 ; 

,, , N{N - 1) 

B Na — <r*, 

- 1 ’ 
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Na* 

N(t* 

N<t'- 


r(<p - 1) - (N-iy 

[ CP - 1 
<P - N 
(P - 1 


irzx = 


Since each sample consists of N‘ items, each deviation of a sample sum 
from the arithmetic mean of the sample sums is N times as large as each 
corresponding deviation of a sample mean from the arithmetic mean of 
the sample means, Xif, and each squared deviation of a sample sum is 
times the squared deviation of each sample mean. Therefore, the stand- 
ard deviation of the sample sums is N times the standard deviation of the 
sample means. Dividing each side of the last equation by N gives 


<rx 


_ / <P'- ^ 

“ vn ^ CP - r 


If CP is infinite, or, if <P is finite but largo in relation to N, so that the 
/cP -~N 

value of jj- is effectively 1, the expression may be written 

Section 24.3 

To show that it ^ ^2^ when Ni — N2 — ■ ’ • = 


N 


The variation of a single sample from X(p is Z{X — X(p)®. This maj 

1 

be divided into two parts 

S(X - = S[(X ~ X) + (X - 

1 1 

i 

where X represents the mean of a sample, 

= S[(X - Xy -1- 2{X -X)iX - X«.) + (Z - Z<p)»i, 

1 

» S(X - Z)* + 2(Z - Z<p)S(X - Z) + N(X - Ztf.)*, 
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But i)(X — 5^) = 0, and, therefore, 

1 ^ 

2(x - = s(x - + NiX - x^y. 

1 1 

Summing for the K samples, 

2 I :S(X - = 2 |^2(A - I)*J + 2[Ar(^ - 

N 

Each random sample of N items contains ~ of the population, and each 
N 

item will occur - K times. Considering each of the three parts of the 
(P 

preceding expression separately, we have 

K [ N '\ N ^ 

2 2(x: - X<,y = - A2(x - X^y, 

1 L 1 J Cr 1 

2(x - X<fy 

= NK > 

(? 

= NK<t\ 


2 [ 2(X - 1)^1 = S(iVs*), 
ill J 1 


K 

= NZs\ 

S.r= 

where s* is the variance, s’ = of a sample. 

2[Ar(X - X<py] = iV2(2 - Xs>)\ 

1 I 


= NKal. 


We may now write 


NK(r^ = iV2s’ + NKffJt, 
1 

and, dividing by A, _ 

iVo-’ = iVs’ + Nal, 

where ^ is the arithmetic mean of the s’ values. 


= iVs’ + N —I 
N 

= + <r*. 

jV<r* - a* = n 7 \ 
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<r*(iV - 1 ) = NP. 

N - 


- 


N - 1 

N 

N - 1 
Sar? 


s*, 

N N 


K 


+ 


2x? 


N - I N - 1 


: + 


+ 


V,2 

N 


+ 


N - 1 


+ ^2 


+ »l 


Section 24.4 


To prove that <rjf j-jp, = vV|, + for independent samples. 

Given two independent series of paired arithmetic means, the means 
being for random samples of the same size, and each series consisting of 
K means, as follows: 


Sample 

Series t 

Series 2 

Difference 

1 

X... 

JP,.. 

A i.i — ^2.1 

2 



^ l, 2 — -^2.2 

3 

jc.., 

.^2.» 

-Yi,3 -^2.2 

K 

^ \.K 

2.A' 

^l,K — 


The variance of the differences is 

' - X2) - (Xi - 

__ J 


where — ^ 2 ) is the arithmetic mean of the differences and may be 
written 

K K K 

2(Xi - ^ 2 ) 2^1 2 X 2 

1 11 ^^ 

= Ai — A 2 , , 


K 


K 


K 


where X\ and X 2 are ♦the arithmetic means of series 1 and series 2, 


so that 


2 [(^. - ^,) - (^i - ^*)]* 

2 1 

St(X, - Jti) - 
1 

ES ■ ■ - 

K 
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Writing 3ti — and *2 == Xj — we have 

K K 

S(f, - J 2 )* S(*? - 2Jif* + 2l) 

I 1 _ 

K '" K 

K K K 

Xx\ 2iif2 Sj| 


^2 


2 — 

K K 


+ 


K 


K 

Xfili 


Now, — is a portion of the expression for the correlation coefficient 
K. 


K 

S2ll2 

1 


for the two series of means, which may be written rf,^, = — (see 

, * KiTxfTiit 

page 465 for the product-moment formula for r for a sample), so that 






Zxl 


K 

Therefore, 


= 2rs,x,<Ts,<Ts,- Also, ~ = <rl. and ~ = <rl,. 


K K 

ffl, -X, = - 2rx.j,(rj,(7.f, + <Tx„ and 

*^x,-.f, — — 2rt,j,<r,2,<r,2, -f <r{,. 

Since the two series of means are independorit, = 0 and 


+ ai,. 

Section 24.5 

Q,2 _j_ g.2 

H ij 5 equally weighted average of and Using weights 

2 

equal to the number of degrees of freedom {N\ — I and .Va — 1) in each 
of the two samples, we have 


^1+2 = 


(N. - 1)^5 4- (N2 - 1)^1 


N\ — 1 + Ni — 1 


(Ni - 1) 


N, - 1 


+ (N, - 1) 


AT, - 1 


N, - 1 -1- AT, - 1 

ZxX + 2x| 


ATi - 1 -b AT, - 1 
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Section 24.6 


n 1* 

To prove that ^i+* ^ ~ Ni ^ N% = N, 

6 i/— + — = 4- 


l(N - ml + (N- m\ (N - 1)^? + (.V - 1)&J 

+ 


iV - 1 + iV - 1 


N -l + N - 1 


N 


N 


-V 


\{N - 

{N - imi -h &i) 

2N -2 

2N -2 

h ^ 


JV 


N 


m+»i ^i+»i 


N 


+ 


N 


= V 


AT, ATi 


Section 25.1 


To prove that <r, 


-# 


A proportion p is the arithmetic mean of a series of values where each 
occurrence equals 1 and each non-occurrence equals zero. 

For a sample, we have: 

Number Fro portion 

Occurrences a p 

Non-occurrences g 

Total N 1.0 

It is obvious that a = Np and b = Nq. 

Since an occurrence equals 1 and a non-occurrence equals iero. ve have 

•'S 

^ a(l) + 5(0) a 

X ^ ^ = R 


and it follows that ag = tfp = 


Vif 
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To obtain an expression for cr, we use the following population symbols: 

Number Proportion 

Occurrences cc ir 

Non-occurrences 0 r 

Total ^ 1.0 

oc 

It is clear that tt = - and t = — 

<P (P 

Again, each occurrence equals 1 and each non-occurrence equals zero, 
so that 


" - ’T), 

= Vttt. 


We may now write 


- _ :^ 5 r .. Ittt 

~ Vn ~ Vn ~ 


V'n Vn 

Since a = ‘iVp, we may also write 

(r„ = N<t^ = =^Vn 


Section 26.1 


TTT. 


To prove that 


Z[Nc{Xc - xy\ = 2 

1 1 


r/^‘ \2 

{zx) 


/ 


iVc 


(SX)^ 


The expression on the left says: “For each column, square the deviation 
of the column mean from the grand mean, multiply by the number of 
items in the column, and sum these products for all columns.” 

- Xy] = S[Ar.(J,- - 2]!Z + ^’)], 

1 1 

= + NS^), 

1 

= S(JVc^*) - 2XliNj!c) + S(Ar«X»). 

1 1 1 



816 


APPENDIX S 



hection 26.2 



The expression on the left says' “For each column, total the squaiec 
deviations from the mean* of that column and sum these totals for al 
columns ” 
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Section 26.3 


m - 2) /6*Sx*(N ■ 

To prove that V- f^ Vr = \ ^ 


2 ) 


4 


Since 6 = 


'r \N - 2) 
1 - r» 

Xxy 


Si* 


(2iy)* 

Si* 


, (Sii/)* 

Sy* 

= 6*Si*, and 


Si* 


(AT -2) 


SyJ 




'r* (Ar - 2) ^ /6*Si*(Ar - 2) 

1 _ r* “ 




Section 26.4 


Tc prove that t* = F for coefficients of partial correlation. That is^ 
that 


(Si,V 

1-rK 


- 2i*,.2,4...,„.i,)(Ar - m) 


Since r* 


2l| — Sl;|,234 

2a!el.J34---m ~ 21(1.234. I, 


lfn.23 - • • fm-1) 


2iJ - Si.S. 


we may wnte 


234 • • * (ift— 1) 


»'lm. i3 ■•(m-l)(N - m) 

1 * **lm. 33* • • (m— 1) 


2X(,.2,4...m - 2l|i. 


234 — (m-l> 


Xx\ 234- .. (m~l) 


(N - m) 


^^el.234 • • (w-l) 

2X^1.234 - • • (m-1) 


2x,^. 


234>-m 


23?c 1.234 • -On-l) 


2x? - 2^:, 


234 • • • (m-l) 


(s^:. 


234 * • • m 


- 2x*i.234...,^_,,)(Ar - m) 


2xi - 2x*i.284..., 
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Rounding Numbers* 


Terminology 

Original data result from measurements (which can never be exact) or 
from counting. Measurements will therefore always be rounded; counts 
may be rounded. A number which is the result of rounding always 
represents a range of possible values rather than a single value. Thus, if 
such a number is recorded as 78 pounds, we know that the true value is 
not lower than 77.6 pounds nor higher than 78.5 pounds. 

A digit is significant if the error in the next position to the right does 
not exceed ±5. Thus, if a measurement is recorded as 172.3 pounds, we 
assume that the correct value does not lie beyond the limits of 172.3 ± 
0.05, or 172.25 pounds and 172.35 pounds, and there are four significant 
digits. It is sometimes difficult to ascertain the number of significant 
digits, even in an enumeration. Thus, it is extremely unlikely that there, 
were exactly. 150,697,361 persons in the United States on April 1, 1950, 
as reported by the Bureau of the Census. 

Below are given three illustrations of correct terminology for measure- 
ments that have been accurately made and properly recorded, or for 
rounded enumerations: 

127.34 is said to contain five significant digits. It has been rounded 
to five significant digits, or to two significant decimal places. 

4,125 thousand or 4.125 million or 4,125 X 10® or 4,125,ooo, is signifi- 
cant to four digits. If occurring in a table, usually 4,125 is recorded, with 
a prefatory note or column heading specifying thousands. The number 
of significant digits in 4,125,000 is ambiguous, since it may range from 
four to seven. The context, however, often indicates the number of 
significant digits. There is no ambiguity if a number ends in zero after 

* This discussion of rounding numbers is from F. E. Croxton and D. J. Cowden, 
Practical Bucinees Staliatics, Second Edition, Prentice-Hall, Inc., New York, 1048, 
pp. 503-506. 
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a decimal point. Thus 4,125.0 and 4.1250 each have five significant 
digits. # 

0. 00031. contains two rather than five significant digits (though 0.10031 
contains five and 1.00031 contains six). This is because the choice of a 
unit of measurement is arbitrary. For instance, 0.031 meters is also 31 
millimeters. The importance of this concept will be apparent when rules 
for multiplying and dividing rounded numbers are given. 

Rules for Rounding 

1. If the leftmost of the digits discarded is less than 5, the preceding 
digit is not affected. Thus 113.746 becomes 113.7 when rounded to four 
digits. 

2. If the leftmost of the digits discarded is greater than 5, or is 5 fol- 
lowed by digits not all of whi^h are zero if carried out to a sufficient num- 
ber of digits, the preceding digit is increased by one. Thus, 129.673 
becomes 129.7 when rounded to four digits. Also, 87.2500001 becomes 
87.3 when rounded to three digits. 

3. If the leftmost of the digits discarded is 5, followed by zeros, the pre- 
ceding digit is increased by one if it is odd, and left unchanged if it is 
even. The number is thus rounded in such a manner that the last digit 
retained is even. For example, 103.55 becomes 103.6 and 103.45 becomes 
103.4 when rounded to four digits. (However, 103.5499 becomes 103.5 
as explained in paragraph 1, and 103.4501 becomes 103.5 as explained in 
paragraph 2.) This rule is adopted in order to avoid the cumulation of 
errors in summations, which could result if the preceding digit were always 
raised or always left unchanged. The rule (making the last digit even) 
is more generally used than its reverse (making the last digit odd). It 
is more convenient than alternately adding and dropping the half, since 
one is spared the trouble of remembering which was done last. 

Products and Quotients Obtained from Rounded Numbers 

1. In multiplication (including squaring), division, or extraction of 
square root, one should not record as a final answer more digits than there 
are in the original number with the fewest significant digits. * The f ollow- 

^ In special circumstances an exception may be made to this rule, provided the 
number of digits that are significant in the answer is clearly indicated. 

Where several computations involving multiplication, division, or extracting a 
square root are involved in working with one set of data, it is sometimes advisable 
to record one more digit in intermediate computations than there are in the original 
number with the fewest significant digits, ^metimen more than one nonsignificant 
digit may be desirable. In this volume we have sometimes carried more than one 
nonsignificant digit in order to obtain a formal check on the accuracy of our com- 
putations. Whi^e the extra digits may not be absolutely accurate, they are suffi- 
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ing illustrations thus indicate the maximum number of digits which it is 


good practice to record: 



358 X 412 


147 thousand. 

14 X 427 

s= 

6.0 thousand. 

3,194 X 25 X 427 

= 

34 million. 

4,831 X 0.00112 

rr 

19.9 

5,673 X 8 (exactly) 

= 

45.38 thousand 

25 H- 23 


1.1 

42.7 ^ 52 

= 

0.82 

52 ^ 42.7 


1.2 

V'O.354 


0.595 


In the above illustrations the maximum number of digits that may be 
significant is recorded; in some instances -the number signilicant will be 
fewer than the number recorded,^ 

2. If a given number of significant digits is required in the final answer, 
each of the original numbers and each of the intermediate results should 
have one more significant digit than the number of digits required in the 
answer. If any of the original data contain more digits than called for 
by this rule, the excess digits may be rounded off. Thus, if three digits 
are required in the final answer, we may proceed as follows: 


1 (2.7608)* 

1 (2.761)* 

/7.623 

« (13.195)(0.87367) 

^ (13.20)(0.8737) “ 

'11.53 


VO-GGIl = 0.813. 


As is almost always the case, the final answer is the same as if we had 

*■ % 

retained all of the original digits and also one more digit in each inter- 
mediate step: 


(2.7608 2 7.6220 /rTTr":; 

\ L ^ == . ^ == V 0.661 17 - 0,813. 

^(13.195) 0.87367 >11.528 


The rounding of the original data is justified because of the small 


ciently close to contribute something to the final an.swer. For instance, if we want 
three digits in our final an.svf^r and have (4.137 X 0.684) -r- (0.316 X 7.831) we 
would employ 2.830 2.475 « 1.14 rather than 2.83 4* 2 47 « 1.15. 

* In the case of the seventh illustration there is, strictly speaking, only one signifi- 
cant digit in the answer. Remembering that a rounded number recorded as 42.7 
may vary between 42.65 and 42.75, while one recordfMi as 52 may vary between 61.6 
and 52,5, we may compute: 

42.75 4- 51.5 «* .830 to three digits, the largest possible result; 

42.7 4- 52 « .821 to three digits; 

42.65 4- 52.5 « .812 to three digits, the smallest possible result. 

Since .830 and .812 are not included within .821 ± .005, it is apparent that the 

eecond digit in .821 is not significant. 
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probability that most of the numbers invohied will be in error close to 
the maximum possible amount, and the large probability that there will 
be considerable offsetting of errors. 

3, When the correct product or quotient is known in advance, it should 
be recorded rather than the approximate product or quotient resulting 
from use of the rounded original numbers. Thus, although 0.125 X 

0.333 = 0.0416, if it is known that the actual operation is ^ X i ~ 
0.0417, the answer should be recorded as 0.0417 rather than 0.0416. 

Sums and Differences Obtained from Rounded Numbers 

Rules for addition and subtraction substantially parallel those for 
multiplication and division, except that it is the number of significant 
decimal places, rather than the number of significant digits, that must be 
considered. 

1. In addition or subtraction, one shouh^ never record as a final answer 
rnpre decimal places than there aie in the original number with the fewest 
si'-rnilicant decimal places. The following illustrations thus indicate the 
maximum number of digits which it is good practice to record: 

2,156.2 + 39 - 2,195. 

2,156.2 - 39 - 2,117. 

13 + 12 - 25. 

13 - 12 - 1. 

In the above illustrations the maximum number of significant decimal 
places is recorded; in some instances the number significant wull be fewer 
than the number recorded.^ 

2, If a given number of significant decimal places is required in the 
final answer, it is desirable that each of the original numbers have one 
more significant decimal place than the number of decimal places required 
in the answer. If any of the original data contain more digits than called 
for by this rule, the excess digits may be rounded off. Thus, if no decimal 
place (no digit to the right of the decimal point) is required in the final 
answer, we may proceed as follows: 


• 

122.34 ] 

[122.3 


81.7 

► may be romid'-d to| 81.7 


293.826) 

l‘>93.8 


497.866 

497.8, 


both of which round to 498. 


• If the student will check the last two results by a procedure similar to that 
described in footnote 2, he will find that the last digit recorded is not significant, since 
the limits of error arc tl.O, instead of the permissible ±0.5. 
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The rounding of the original data is justified because of the small proba- 
bility that most of the numbers involved will be in error close to the 
maximum possible amount, and the large probability that there will be 
considerable offsetting of errors. 


3. When the correct total is known in advance, it should be recorded, 
rather than the approximate total resulting from addition of the rounded 

numbers. Thus: 


Thousands 

Per cent 


Dollars 

of 

of 



dollars 

total* 


607,. ■534 

507.3 

66.67 


126,832 

126.8 

16.67 


126,834 

126 8 

16.67 

Total of recorded numbers 

. .. 761,000 

760 9 

100.01 

Record the total known to be correct . . . 

761.000 

761 0 

100.00 

* Computed Irom column 1. Total would not be exactly 100, even 

if 7 diKita were 

recorded for 


each percentAKe. 

mi 

rei 

8U- 

no 

no 

m 

0 .( 

di( 

d\i 

W€ 

as 

im 

roi 

to 

ca: 
a I 
of 
foi 
sig 


Frt 

pp. 
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Selected List of Readily Available 
Sources of Data 


For each source, the current title, frequency of appearance, and 
issuing organization are given. Many of the sources have had titles 
different from those shown, have appeared more or less frequently than 
at present, or have been released by different organizations or by the 
:i:*me organization under a different name. For such changes, see the 
introductory paragraphs in the sources. 

A- GENERAL 

Statistical data from more than one field will be found in these 
publications of a general nature. 

1. An Almanack (also known as Whitaker^a Almancui). Annual. 

Joseph Whitaker, London. 

2. County and City Data Book^ 1952, One previous issue, dated 1949. 

There is also a County Data Book, dated 1947. Bureau of the 
Census. 

3. Distribution Data Guide. Monthly. Department of Commerce. 

4. The Economic Almanac, Annual. Published by Thomas Y. Crowell 

Company, New York, for the National Industrial Conference 
Board. 

6. Economic Indicators. Monthly. An historical and descriptive 
supplement was issued December 1953. Joint Committee [of 
Congress] on the Economic Report. 

6. Federal Reserve Bulletin, Monthly. Board of Governors of the 

Federal Reserve System. 

7. The Handbook of Basic Economic Statistics, Monthly. Economic 

Statistics Bureau ol Washington, D. C. (A private organiza^ 
tion.) 

8. Historical Statistics of the United States 1789-1945 and Continvaiion 

to 1952 of Historical Statistics of the United Stales, Both are 

823 
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supplements to the Statistical Ahstrcu^ of the United States, 
Bureau of the CeAsus. 

9. Monthly Bulletin of Statistics, Statistical Office of the United 
Nations, New York. 

.0. Standard and Poor's Trade and Securities Statistics, Current Statistics, 
issued monthly, contains cumulative data available since the 
previous issue of Current Statistics Combined with Basic Statistics. 
This latter publication supplements the eleven basic statistics 
pamphlets on various topics and the 1952 edition of Security 
Price Index Record. Standard and Poor’s Corporation, New 
York. 

.1. The Statesman's Yearbook. Macmillan and Company, Limited, 
London. 

12. Statistical Abstract of the United States. Annual. Bureau of the 

Census. 

13. Statistical Yearbook. Statistical Office of the United Nations, New 

York. 

[4, Survey of Current Business. Monthly with weekly supplements. 

Biennial supplements entitled Business Statistics are also issued. 
Office of Business Economics of the Department of Commerce. 
L5. The World Almanac and Book of Facts. Annual. New York 
World-Telegram and The Sun. 

Periodicals such as: 

16. Barrens. Weekly. Barrens Publishing Company, New York. 

17. Business Week. McGraw-Hill Publishing Company, New York. 

18. The Magazine of Wall Street. Bi-weekly. The Ticker Publishing 

Company, New York. 

Daily newspapers. 

B, COMMODITIES— PRICES, PRODUCTION, 
CONSUMPTION, STOCKS, EXPORTS, AND IMPORTS 

1. Agricultural Prices, Monthly. Agricultural Marketing Service. 

2. Agricultural Situation, Monthly. Agricultural Marketing Service. 

3. Agricultural Statistics. Annual. Before 1935, statistical material* 

was in the Yearbook of Agriculture. Department of Agriculture. 

4. Annual Survey of Manufactures. Bureau of the Census. 

5. Census of Agriculture. Quinquennial since 1920, decennial 1840-' 

1920. Bureau of the Census. 

6. Census of Business. Latest, 1948; previous censuses, 1929, 1933, 

1935, and 1939. Data for 1954 collected in 1955. Bureau of 
the Census. 
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7 - Census of Manufactures. Latest, 1947*^ none taken 1940-1946; 
biennial 1921-1939, quinquennial 1901 1910, decennial (1829 
omitted) 1809“ 1899. Data for 1954 oolU^cted in 1955. Bureau 
of the Census. 

8. Census of Mines ami Quarries. Late.st, 1930; approximately decen- 

nial 1840-1939. Data for 1954 collected in 1055. Bureau of 
the Census. 

9. Commodity Yearbook. Not published 1943 1047. Commodity Re- 

search Bureau, Inc., New York. 

10. Consumer Price Index. Monthly. Bureau of Labor Statistics. 

11. Crops and Markets. Annual. Agricultural ^Ta^ketinf^ S^u vice. 

12. Daily Index Numbers and Spot Primary Market Prices. Weekly 

Daily data available but no daily mailings. Bun^au of Labor 
Statistics. 

13. Foreign Commerce Weekly. Bureau of Foreign C%)mmerce. 

14. Foreign Trade Reports. Monthly and annual. Bureau of the 

Census. 

15. Minerals Yearbook. Bureau of Mines. 

16. Monthly Bulletin of Agricultural Economics and Statistics. Food and 

Agriculture Organization of the United Nations. Rome, Italy. 

17. Monthly Jjabor Review. Bureau of Labor Statistics. 

18. Monthly Retail Trade Report. Bureau of the Census. 

19. Monthly Wholesale Trade Report, Sales and I nventories. Bun^au of 

the tkmsus. 

20. Quarterly Summary of Foreign Commerce of the United States. Bureau 

of the Census. 

21. Retail Food Prices by Cities. Monthly. Bureau of Labor Statistics. 

22. Retail Prices and Indexes of Fuels and Elect ncity. Monthh". Bureau 

of Labor Statistics. 

23. Sales Management Survey of Buying Power. Annual. Sales Man- 

agement [Magazin(‘], New York. 

24. Wholesale Price Index [Monthly], Prices and Price Relatives for 

Individual Commodities. Monthly. J^ureau of Labor Statistics. 

25. Wholesale Price Index [W'eekly] and Percent Change in Spot Market 
• Inde.res and For Selected Commodities. Weekly. Bureau of 

Labor Statistics. 

26. Wholesale {Primary Market) Price Index. Monthly. Bureau of 

Labor Statistics. 

Special studies of the various services and divisions of the Department of 
Agriculture, of the Bureau of Labor Statistics, and of state agricul- 
tural experiment stations. 
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C. FINANCIAL— IvlONEY, BANKING, SECURITIES, 
INTEREST RATES, TAXATION, ETC. 

1. Annual Report of the Board of Governors of the Federal Reserve Sifst-em 

2. Annual Report of the Comptroller of the Currency. 

3. Annual Report of the Federal Deposit Insurance Corporation, 

4. Annual Report of the Secretary of the Treasury on the State of the 

Finances. 

5. Annual Report of the Securities and Exchange Commission. 

6. Annual reports of state banking departments. 

7. and Liabilities of Operating Insured Banks. Semiannual. 
Federal Deposit Insurance Corporation. 

8. Bxdletin of the Treasury Department. Monthly. Department of the 

Treasury. 

9. The Commercial and Financial Chronicle. Semiweekly. William B. 

Dana Co., New York. 

10. Daily Statements of the United States Treasury. Daily and senii- 

monthly. Department of the Treasury. 

11. Dun's Statistical Review. Monthly. Dun and Bradstreet, Inc., 

New York. 

12. Federal Reserve Charts on Bank Credit^ Money Rates, and Business. 

Monthly with annual supplements. Board of Governors of the 
Federal Reserve System. 

13. Income Distribution in the United States. Data for 1950, 1947, 1946, 

and 1944. Office of Business Economics, 

14. International Financial Statistics. Monthly. International Mone- 

tary Fund, Washington, D. C. 

16. National Income and Product in the United States. The 1954 edition 
replaces the 1951 edition. Office of Business Economics. 

16. Statistical Bulletin, Monthly. Securities and Exchange Com- 

mission. 

17. Statistics of Income. Annual. Internal Revenue Service. 

Bulletins of the individual Federal Reserve Banks. 

Bulletins of various large banks. 

Data concerning city and state finances are to be found in reports issued 
from time to time by the Bureau of the Census. 

D. EMPLOYMENT, WAGES, AND HOURS OF LABOR 

1. Employment and Earnings. Monthly. Bureau of Labor Statistics. 

2. The Labor Market and Employment Security. Monthly. Bureau of 

Employment Security. 
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3. Monthly Labor Review. Bureau of Labor StMistics. 

4. Monthly Report on the Labor Force. A Current Population Report. 

Bureau of the Census. 

5. Yearbook of Labour Statistics. International Labour Office. Geneva^ 

Bulletins of state bureaus of labor or industrial commissions. 

Special bulletins of the Bureau of Labor Statistics and of the Women’s 
Bureau. 

E. ACTIVITIES OF INDIVIDUAL CONCERNS 

1. Best^a Insurance Reports (fire and casualty) and Best's Life Insurance 

Reports. Annual. Alfred M. Best Company, New York. 

2. Fitch Bond Record. Weekly. The Fitch Publishing Company, New 

York. 

3. Fitch Individual Bond Bulletins. Listed and unlisted bonds. Four 

each week. The Fitch Publishing <"^ompany, New York. 

4. Fitch Individual Stock Bulletins. Listed stocks. Five each week. 

The Fitch Publishing Company, New York. 

5. Fitch Stock Record. Monthly. The Fitch Publishing Company, 

New York. 

6. Fitch Unlisted Securities Service. Unlisted stocks Four each week. 

The Fitch Publishing Company, New York. 

7. Media Records. Newspapers and newspaper advertisers. Monthly, 

quarterl}^, and annual; also special reports. Media Records, 
Inc., New York. 

8. Moody's Bond Survey. Weekly, Moody’s Investors Service, New 

York. 

9. Moody's Manual of Investnimts. Five ^^olumes: industrials; rail- 

roads; public utilities; government', and municipals; hanks, 
insurance, real estate, and investment trusts. Annual with 
semiweekly bulletins. Moody’s Livestors Service, New York, 

10. Moody's Stock Survey. Weekly. Moody’s Investors Service, New 

York. 

11. Security Owners Stock Guide. Monthly and year-end. Standard 
. and Poor’s Corporation, New York. 

12. The Spectator Insurance Year Book. Two volumes: life; fire and 

marine, casualty, and surety. Annual. The Spectator Com- 
pany, Philadelphia. 

13. Standard Corporation Reco) ’s. Daily dividend section with weekly, 

monthly, and annual cumulations; daily news section with par- 
tial cumulations each month; descriptions of corporations con- 
tinuously revised resulting in complete revision each year. 
Standard and Poor’s Corporation, New York. 
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Reports of state insurance commissioners. 

Annual reports of corporations to their stockholders. 

F. MISCELLANEOUS 

1. Annual Report of the Coinviifisioner of Internal Revenue. 

2. Annual Report of the Immigration and Naturalization Service. 

3. Automobile Facts and Figures. Annual. Automobile Mantjfac- 

turers Association, Detroit. 

4. Census of Housing. 1950 and 1940. Bureau of the Census. 

5. Census of Population. Decennial. Bureau of the Census. 

6. Construction Review. Monthly, Bureau of Labor Statistics and the 

Building Materials and Construction Division of the Depart- 
ment of Commerce. 

7. Current Population Reports. Deal with labor force (monthly, see 

reference D-4), population estimates, population character- 
istics, special population censuses, and consumer income. 
Intervals of issue vary. Bureau of the Censu.s. 

'8. Demographic Yearbook of the United Nations. New York. 

9. Dodge Statistical Research Service. Construction data. Monthly. 
F. W. Dodge Corporation, New York. 

10. Electric Power Statistics. Monthly, Federal Power (\/mrnission. 

11. Highway Statistics. Annual. Bureau of Public Roads. 

12. Life Insurance Fact Book. Annual. Institute of Life Insurance, 

New York. 

13. Monthly Review. Railroad Retirement Board. 

14. Monthly Survey of Life Insurance Sales in the United Stales and 

Canada. Life Insurance Agency Managcincnt Association, 
Hartford. 

15. Monthly Vital Statistics Report. National Office of Vital Statistics. 

16. Motor Truck Facts. Annual. Automobile Manufacturers Associ- 

ation, Detroit. 

17. Municipal Yearbook. International City Managers Association, 

Chicago. 

18. Public Health Reports. Monthly. Public Health Service. 

19. Social Security Bulletin. Monthly. Social Security Board. 

20. Statistical Handbook of Civil Aviation. Annual with quarterly supple- 

ments. Civil Aeronautics Administration. 

21. Statistics of Railways in the United States. Annual. Interstate Com- 

merce Commission. 

22. Statistics of the Communication^^ Industry in the United States. 

Annual. Federal Cominunicationa Commission. 
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23. Vital Statistics of the United States. Annual. National Office of 

Vital Statistics. * 

24. A Yearbook of Railroad Information. Eastern Railroad Presidents' 

Conference, New York. 

Bulletins of university bureaus of social, economic, and business reseanffi. 

Monographs and special studies of the Bureau of the Census, the Bureau 
of Foreign CVmnnerce, the Oflictj of Business Economics, the liureau 
of Labor Statistics, the Office of Education, the Agricultural Market- 
ing Service, and numerous ot-her governmental offices, bureaus, com- 
missions, and boards. 

Statistical information concerning specific industries may be had from 
trade papers and trade a.’^social ions 

A list of sources of data is given on pp. 30t) 307 of Business Statistics for 
197)3. See reference A-14, above. 
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Adlor, F.. 689n 

Aggregative price index numbers: 
simple, 405-406 
weighted, 400-413 

approximate weights, 412 -413 
average quantities, 410 
base-period quantities, 409 
giVen-year quantities, 409- 410 
roup weights, 417 418 
ighcst common factor, 410-411 
“ideal,” 411 412 
Marsh all-Edgeworth, 410 
Aggiegaiive .pjantity index numbers. 421 
423 

Agricultural Marketing Service Indexes, 
439-441 

Alienation, coeflicieni of. 463n 
Alphas. 231. .234, 239. 019-fy22 
American Institute of }^^blic Opinion, 
sampling method of, 32 
A. T. and T. Index of Industrial Activity. 
444 

Amplitude ratio, 361 
moving, 362 

Analysis of variance (ace oho Variance and 
Variati(m) : 

described, 706, 709-711 
one criterion of cla^^l^icfltion. 706-711 
test of seasonal index, 339 
two criteria of classification* 
one entry in a box, 711 714 
several entries in a box, 711-714 
used in correlation: 

multiple correlation, 733 734 
non-linear correlation, 72S-732 
partial correlation, 735 
two- variable linear correlation, 723n 
Area sample, 29 
Arithmetic mean: 

behavior of, from samples, 026 634 
comparison of several from samples (ucc 
Analysis of variance) 
confidence limits of. 648-650. 653-054 
definition, 173 

dispersion of, from samples, 632 634 
graphic location, frequency curve, 192- 
193 

kiirtosis of, from samples, 629 631 
mean of, from samples, 627 
modified forms, 182- 183, 322-323, 335- 
339 

of averages, 184- 186 


Arithmetic mean (coni.): 
of grouped data: 

long method, 176-179 
open-end classes, 182 
short methods, 179 181 
unequal class intervals, 181-182 
of percentages, 151- 152, 183-184, 680 
of ungrouped data, 173-174 
properties of, 174- 176 
significance tests of difference between; 
sarnpit; mean and population mean. 
635- 650 

two sample means, 651 667 
skewness of, from samples, 627 629 
standard error of, from samples, 032- 
633. 807- 810 

Arithmetic mean, median, and mode, 
characteristics of: 
algebraic treatment, 193 
extreme values, effect of, 195-196 
familiarity of, 192 
graphic location of, 193 
irregularity of data, effect of, 196 
mathematical properties of, 197 
need for classifying data, 193 -194 
open-end cl asses, effect of, 194-195 
reliability of, 197 

bciectiou of appropriate measure, 197- 
198 

skewness, effect of, 195 
uiie<iual ca s intervals, effect of, 194 
Arithmetic | o bability paper, 607 
Arithmetic p. 'grossion, 93 94, 103, 104 
Arrangement in tables: 
alphabetical. 58 
customary, 59 
geographical, 58 
historical, 59 
magnitude, 59 
numerical, 60 
progressive, 59-60 
Array, 154^156 

Asvmmetrical curve (see Skewed curve) 
As\'niiiietry (see Skewness) 

Asymptotic growth curves (see Modified 
exponential; fiompcrtz; Logistic) 
Average (sfie Central tendency) 

Average deviation, 215 

Average-of-relativos index number (see 
Index numbers) 

Axes, for curves, 68 -7 1 
Ayres' Index of State School Systems, 
396 

Ayres, L. P., 396 
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B 

Bar chart: 

compared with simple curve, 122-123 
complex types, 121-123 
component-part, 125-130 
frequency distribution column diagram, 
73-74 

simple, 119-120 
Barton, H, C., 330n, 350n 
Base line, 80 
Betas: 

coefficients in correlation, 557-658 
measures of skewness and kurtosis, 220 - 
239, 619-622 

significance of measures of skewness and 
kurtosis, 720-722 
tables. 754-765 
Bias, 7 

in a sample, 31, 33- 34 
Biased estimate of 644 
Bi-modality, 191-192 
Binomial: 

and normal curve, 591-594 
fitting of, 507--613 

used with sample proportions, 661 663, 
667-672, 075 -679 
Birth rates, 144 
Brinton, W. C., 76n 
Bruce. D.. 457, 488 
Rrunibaugh, M. A.. 388n 
Burns, A. F., 392n, 579n 
Business activity, indexes of, 442-446 
Business activity in Pittsburgh, index of, 
445-446 

Business cycles (see Cyclical movements) 

C 

Calendar, flexible, of working days, 738- 
739 

Calendar variation, adjustment for, 253- 
256, 325-328 
Campbell, N. H., 591n 
Camp-McidcII inequality, 221 
Card punch, 42-43 

Causation confused with association, 9-10, 
469-470 

Central tendency, measures of (fiee also 
Arithmetic mean, Geometric mean, 
Harmonic mean, Median, and 
Mode): 

arithmetic mean, 173-185, 416 
comparison of arithmetic mean, geo- 
metric mean, and harmonic mean, 
191^201. 204-209, 793 -794 
comparison of arithmetic mean, median, 
and mode, 192-198 
geometric mean, 19<S-203, 418-420 
harmonic moan, 203 209, 420, 430 
median, 185-187 
mode. 189-191 

modified mean, 182-183, 322 323, 335-339 
quadratic mean, 209 
Chaddock. R. E., 150n, 480n 
Cham index: 

advantages and disadvantages, 431 -432 
description, 431 ^ 

illustration, 431 
Changing seasonal, 247- 248 
progressive, 345-35) 
sudden, 351-362 


Chart construction, rules for (see specific 
type of chart) 

Chart proportions, 81-83 
Charts, types of (see also specific types of 
charts), 68-69 

Chebycheff’s (Tchebychoff) inequality, 221 
Chi-aquaro: 

alternative exact methods, 683, 586-689 
curves of, 682 

degrees of freedom for, 681, 685 -686, 690- 
691, 699 

distribution of, 683 

relation to coefficient of mean square 
contingency, 481n 

relation to normal, t, and F distributions, 
725-721 

table of values, 752-753 
used as “goodness of fit ’ test, 695-691 
used to obtain confidence limits of cr*, 
701-702 

used to test significance of or 609- 
701 

used with 1 X 2 tables, 681-683 
used with 1 X H tables, 689 -G91 
used with 2X2 tables, 084 086 
used with 2X3 and larger tables, 691 
693 

used with variances, 699 702 
when same us p — t test, 681- 683 
when same ns pi — pa test, 684 685 
Cirede diagrams, 126-130 
Classification : 
bases of, 3 5 
chronological, 4 
concealed, 11 
geographical, 4-6 
qualitative, 3 
quantitative, 3-4 
Clopper, C. J , 678-679 
Cluster sample, 28-29 
Cochran, W, G., 30n, 3ln. 691 
Coefficient of: 
alienation, 4630 

^Correlation (see Determination, coeffi- 
cient of) 

determination (see Determination, coeffi- 
cient of) 

kurtosis, 232-236 
likelihood (see L) 

mean square contingency. 481-482 
net estimation, 5.S3 
non-deterinmation, 463 
separate determination, 558-559 
similarity, 677n 
skewness, 225- 232 
variation, 222-225 
(.'‘ollection of data: 
general plan, 17 
methods: 

enumeration, 10, 34-36 
mad, 16, 18. 35-36 

registration, 16 
procedure outlined, 16 
sample, selection of, 26-34 
schedule: 

editing, 36-37 
making, 18-25 
organizing data from, 37-46 
use of, 34-36 

Commodity Prices, Wholesale, Index of, 
398-400, 438 -439 



INDEX 


333 


Common logarithms: 
explanation, 776 
table of. 777 791 

Common Stock Prices, Index of. 441-442 
C'omponent-part charts: 
bar charts, 120 130 
line diagrams, 90-92 
pie diagrams, 120- 130 
c!!ompound mterest curve, OOn, 202, 290 294 
Confidence limits of 

arithmetic moans, 048 050, 053 (‘.54 
coefficients of detcrnnnalion, 
able linear, 725 

correlation coefficients, two-vanab*e 
linear, 725 

proportions, 67l-f379 
standard deviations, 701-702 
variances. 701 702 
Consumer Price Index, 394. 43(; 438 
Contingencv, coefficient of mean sqiiaro, 
481 4S2 

Continuous variable, 101 
Coordinates for charts. SI 
Correlation : 

and aVerages, 472 -473 
and causation, 409-470 
'i^nd explained variation, 401 -405 
and 'r ♦^''rogcne'it\ . 47(> 172 
and meartui einent of lag (sre Pag) 
cocffnient of {ste Detcnniuatinn, coeffi- 
cient ofj 

cfFci’t of grouping 477 -478 
first inoincnt ('o« relation, 577 
meaning ot, 431 154 
means, use pf (set C'orri'lat ion ratio) 
multiple (see Miiltijile corielat ont 
non-linear Non-linear corr**! if mn) 
of time .series (acr Time senes (an n lat imn 
partial (sec Partial correlation) 

Pearsonian formula (sec prodact-mo/nfiit 
formula, betou 1 

population C'^timate of cneffn lent (sn 
Population estimate* 
product-inoiiKmt formula. 405 4»nl 
qualitative di^stnltulions, 480 482. 524 
ranked rlata, 478- 480 
rt'habilitv of measures. 722 736 
two-variable linear . 

grouped data, 473 478 
ungrouped data, 46fl'4f)9 
Correlation latio, 520 524 

estimate of value in popnlalion, 732 
limitations of, 524 
Biginficam e le'=<ts, 730 732 
(-''osgrove, Jessi('a^7n 
(5)wden. I) J., iTOn, 183n, 4o4n. 47 On, 
73Sn, SlSn 
ox, H., 15 

ritcnon of fit, general, 262 
equal areas, 2fi2 
in Clover's mi'thorl. 29 In 
least squares, 265 275, 796 798 
p.irtinl sums, 302, 309. 310 
selected points, 310, 315 
Cntcnou of likelihood {see L> 

Criterion of significance, choice of, 640 641 
Crow, Carl, 35n 
Crowder, W. F.. 238n, G16n 
Croxton, F. 10., n7n, 126n. 127n, 1 19n, 
162n, lS3n, 32Sn, 453, 464n, 47Un, 
471. 504 n. 5H8n. 594 n. «66n, 7t)4n, 
738n, 748, 749, 818n 


Crum, W. L., 388n 
Curves, fof presenting data: 
axes. t)8 -71 
ba.se line, 80 

cliart proportiims. 81 -84 

compared woth bar charts, 92, 122-123. 

129 130 
coordinates, 81 
lettering, 84 

of freiiuoiK'y distributions, 73-75, 102- 
170 

origin, 70 
quadrants, f)8. 70 
ruling, SO-, SI 
scale labels, 84 
sourte, ^5 
tille, 84- 85 

U"-e of vertical scale brejik, 77 
zero on vertical scale, 76 SO 
t'uive Ivpe, "clectinn of, 280, 318-319 
C'itvilinear correhilion (see Non-lmear 
coi rel,\tior ', ) 
f ' V -le chart , 3s7 
C', ci.calini 'ement*^* 

»o!i)pan^‘»u of, 381 .387. 57, S- 585 
correlation of. 57.S-585 
explairicil, 249 251 
lien hods of isolating: 
dirrs'l, 3SS 

ti irmr'iiir analysis, 388 

rrdi i encc-(‘\ cle amilysi',. 3SS 392 ' 

r(*s!dii:d, 3fi7, 373 3^2 

s}>ei itir -evt le ami'A-'i', 392 

J) 

Data, .slatislual ^ see also Index numbers, 
data for): 
analvsi.s ol, 3 6 
chissifir ,it am id, .3 5 
(oller'Tion <d, 2 3, i<» 45 
cfimparabilit \’ r>f, 4s 49 
inmlfnient, 10 
inierjireta, ion of, 6 
meamm' «'i 1 
pieiiorl dat , 7i-72 
j)oint data, 1 7 5 
I>re.sLUitat ion of 

1>\ clruts, 67 135 (.'a c also ChartvS) 
by ^.*ml-labnla^ devae. 51 52 
bv tables, ->1 5() also Tables, 
slat n ai < 
by text, 50 51 
sonrees ot, 45 49, 823 829 
tabulation of, 37 -45 
Davies Cf. U J38n, .577n, 61ti. 619 
Death rate*'. 143 144 
I)eciles, 1S7 1S9 
Deli. ding, 257 258, 394 
Dr‘gr<‘c- «»f irr>edf)in for 

analysis of variama*. 709. 713, 718 
c!n-s<iuar<‘ tables, fjSl. (>S5 6.S6, 690 691 
te'>l- ol torrrdation meaiiire.s; 
iiuiltipie, 73 ^ 
noii-liiie ir, 727 732 
partial, 727, 734 735 
two-va liable linear, 723 
tests of dilTorenees betwe^'u: 

ini'iins of two independent samples, 
f>53 

means of two non-independent .s.arn- 
plea, 656 
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Degrees of freedom for (cent.) : 
teats of diffcreaces between (cor**.): 
sample moan and population mean, 

645 

sample variance and population 
variance, 700 

several means (see analysis of variance. 
above) 

two sample variances (see also analysis 
of variance, above), 702 
De Moivre, A., 590 
Demonstrations of formulas, 792-817 
Densities (see Frytiu<‘’u*y densities) 
Dependent variable (see V’ariablo) 
Determination, coefficient of: 
inuliiple: 

effect of additional variables on, 534 
four or more independent vuriablea, 
550, 556 

significance tests, 732 -734 
three indejicndent variables, 548, 556 
two independent variabloa, 534, 543, 
546n, 555 -530^ 
multipio-partiai, o5l 
non-linear . 

.second-degTCc curve, 490 491 
significance tests, 7.^0-732 
straight line to logarithms, 511, 519 
straight line to reciprocals, 520 
straight hue to square roots, 514 
tiurd-degreo curve, 197 
partial. 

first order, 534-53.5, 543-545, 562-554 
second order, 549,J)54 -555 
significance tcsit.y, 734 /36 
third or higher order, 550 5f>l, 555 
two-vanabie linear, 401 -46G, 108 469, 

491 493, 540 
confident. e hunts, 726 
significaiice tests, 722 726 
Detorminatirm, coeffit ituit of. estimate of 
population value {see Popuiatiqn, 
estimates of) 

Der.erininfition, cocfb<uent.s of .-.rparate, 

558 559 

Diagram Kite s;ari/u' f/v/vc oj chart; 

Hc-at<‘r 'll igi .iui) 

Dis< rote v:i:i,ihle, 161 
Dispersion 

absolute, 213 222 

graphic ilhi-.tration. 212 
relative, 222- 225 

graphic iilu'^t ral hjh, 224 
Doolittle, M. H,, 49S 
Doolittle method, 49S 503, ,54911 
Double logarithmic paper Lttganlhmic 
chart) 

Do>Ie. R. P., 650, 702 
Duncan. A. .1,, 64 In 

E 

Eaater, adjustment for, 352 359 

'Eaton, E. 1,, 4,38n 

Edgeworth, F. Y., 410 

Editing schedules, 3G-37 

Edmunds, Harriet, 113n 

Elderton, W. P., 61 3n 

Electronic statistical machine, 42-46 

Elmer, M. C,, 13a 

Emphasis, obtaining of, in tables, 66 

Entry form, 157-159 


Enumeration, 16 

Equation type, fitness of, 280, 318-319, 
516^-518 
Errors : 

Type I, 639 
Type II. 630, 719n 

Estimated standard error (see Standard 
error. estimai43d) 

Estimating e< 4 Uatiuii: 
multiple correlation- 

four or more iudeoeririctut variables, 

549 

three independent variables, 540-647, 
548 

two mdependent variables 633. 642- 
543. 54S 649 

riiultiphj curvilinear coricl .^uou, 559- 660 
, non-Uueur (‘orr«4atic>ii' 

sec'oiifi-rlf»greo curve. 480 
slraiiLOit line to logaiit huis, 50,3 504 
50S 509. 512. 5bs 

straiglit line to i-c'ciproeiils, 605 5nt; 
Ml* 

I 'jlraight line to souare roots, 504 606 

,>•.< .616 

tfoid degree cii.'ve, 4>>6, 193 
twt»-vuri'ible Jiucar 'correhit imi : 
k?,rf‘Uped data, 477 

ungrouped data, 454 45S, 466, 467, 
491. 539 

I Estimiitnci, net coorlicient nf :633 
j Explained variatioTi in: 
m ult i i>le eorrela tion ; 

four or mure independent vuriubles, 

550 

three imiependent vuiiiOiles, 547 
two inde-pcndent variables, .’>34, 543 
non-bnear (a^rreiution : 

correlation ratio, 521 .622'^ 
Borond-degMH* curve, 490 
straight line to loganthni'n .61f) 511, 
518 

straight lino to reciprocals, 520 
straight lino to square ruia.s, .614 
tlnrcl-degree curve, 497 
two-vanable linear cratelaMon, 461 464, 
168, 491. 539 540 
Exponential curve: 
filing. 290-294 

p:op«’rtieH of, 290 -uOl 
modifif* I, 298 302 

propiTties of, 298 299 
lv/-ekiel, M., .559n 

I F 


rurvea of, 703 
definition of, 702 703 
distribution of, 7(M 

inclusive of mumal, » hi-.'*r 4 UHro, and i 
distributions, 720 721 
table of values of, 758 762 
used : 

in analysis of variance, 709-711, 713- 
714, 71S-720 

in correlation tests, 72.3n, 728 735 
to test significam e of fifiTeriMu c lictwiam 
two estirnateil variances, 792 704 
I F^. table of valutas of, 747 
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Factor reversal test, 427-428 
Falkiier, Helen D.. 325n 
Farm housing m Oklahoma, index of, 
446-447 

Federal Reserve Index of Industrial 
Production, 442-444 
Ferber, R., 30n, 482n 
Ferger, W. F.. 420, 430 
Fiducial limits (s6e Confidence iirmta) 
Filth-degree curve (see Polynomial series) 
Finney, D. J., G87n 
First momoiit correlaiitm, 577n 
First-ordor partial correlation cooliicients. 

643 545, 562 554 
Fisher. A., (jPilti 

Fisher, I., (see also "Ideal” index mini- 
her). 403n, 41 1, 412n. 427. 428 
429 

Fisher,' ll.* A., 27n. 289n, 724n, 750, 753. 
759 

Flexible calendar of vworkmg days, 738- 
739 

Foot^, U, J., 347n 
Footnotes in tables, 61 
•j-Weeasting, 111-112, 309-310, 310. 579- 
*585 

Fourth-degreo curve (see Polynonnal 
series) 

A , 347n 

Freciuency curvoj? (see also Hinoniiais): 
filling of, 594 623 
graphic comparison of, 166 1G8 
ogives, U)8*"170 
plotting of, 73-76, 162*106 
typos oP 

bimoaal,n9l-192 
reverse J, 164 
skewed, 162-164 
symmetrical, 163 
Frtvpiency' densities, 92, 164-106 
Frequency distribution: 
cliihses: 

and method of reporting vaiues, I6^>~ 
102 

locating mid values, 100 102, 177* 178 
numbci and limits, 159 1(>2 
open-end, 105, 194-195 
point.s of concentration, lOJ 102 
comparison of frequency distnbutions: 
dilTereut class intervals, h>7 
iliff front sample sizes, 166 1 67 
cousi ruction, 150-159" 
cunuilative, Hi8--170 
curves' 

on arithmetic paper, 73-75, 168-170 
on arithmetic probability paper, 607 
on logarithmic probability paper, 615 
U'^mg logarithmic horizontal scale, 614 
plotting, 73 75, 164- 166, 108 170 
plotting when classes are unequal, 164- 
1 66 

Frequency disirilnition and range chart, 

92 

Funkliuiiser, H. G., 68n 

G 

Gallup, G. H., 32n 

Galton, Sir F., 454n 

Garfield. F. R., 235 

Gauss, J. K. F., 690 

Gaussian curve (see Normal curve! 


General table, 53 

Generic^ifTerenoes versus statistical 
differences, 657-668 
Gentile. Marion C., 732n 
Geometric mean: 

compared with iiritlimetic mean, 199 * 
201. 208-209, 418-420, 704-706. 

793 

compared v/ith harmonic moan, 209, 

793 -794 

definition of, 198 
from grouped data, 199, 016 
I from imgrouped data, 198 -199 

i properties of, 198 200 

I uses of- 

! averaging ratios, 201 

j fiiitling rate ol change, 201 203 

; m index nujalu‘rs 411. 4 IS -420 

j in skewed dislnbiitioiis 201, 016 
1 Geometric progro'-'sion (are ahw Oompound 
I interest euivc, f^xponcntial curve): 

logarithms of, pbdte<b \} s 
I plotted on nrilhrnct-ic grid, ‘>4 
j plotted on senn logai'i’.hrniv grid, 99 
I orop-,< ties of, 9 1 

I Glover, ,1. W., 29; o 
j tioui ports curve 302 31U 
j as “law” of Kr<ovtti .s0*J -3j 6 
! charts of chafacteri,?^' 303 

i co:npaii.'i(*n with Io^;is\ic. 316 -old ■' 

[ first dilTeieacca of, 317 

j fitting of, 302 309 

' propcrtfC's of, 302 303 

j Gordon, H A.. 253ri 
j Gram-Charher senes, 619n 
I Ciruphic »netho(l, advantages and Umitu- 
tions of, 67-68 

Graphic presentatfon (ate lypt oj 

chci'l) 

Gresaeris, 0 , 577n 
GrosHinan, H. A., 092 
Grove, R. D., 144n, 243 
Growth curves, asymptotic (a.-c Moddi.'d 
vjx'i oncniial : (joniport?.; Logistic) 
GudfufO b P. 447n, 472n 

H 

Hanson, M. H,, 28n 30n 
HupO'r.ard sample. 32-33 
Harding. P, L., 055 
Iliirinornc annly.sis of lizne ser»e'.. 3H8 
Harmonic mean ; 

compared with arithmetic mean, 204- 
207, 208 200 

compared witl, geometric mean, 209, 

7*^3 794 

computation of, 203 
definition ol, 203 
propi’rtiea of, 203 
uses of: 

averaging prices during crop year, 
207-208 

in index numbers, 416n, 420, 430 
in skewed distributions, 207 
numerator-term weights, 204 207 
Hartley. 11, O.. G88n. 741. 745. 747, 753. 

759, 764, 765 
lOnm, M. H , 216 
Hog-corn ratio, 109-111, 146 
Holmes. B. E., 452 
Hood, W. M., 235 
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Hotelling, H., 228n, 724n 
Hurwitz, W. N.. 28n, 30n 
Hypothesis, null (jiee Null h>pothesi8) 

1 

“Ideal" index nuiiiVjer: 
criticism of, 411-412 
factor reversal test, 427 428 
fornmla. 411 
tiine reversal test, 427 
Improprieties {see also Percent faulty 
use of) : 
bias. 7 

carelessness, 8-9 

concealed classifu'etioi;. 11 

confu.'iion of ass(K'tiil loc and causation, 

9 -10, 4r,9-470 
failure to define units, 1 1 
in.-^utficient data. 10 
inisl«^:idin« totals. 11-12 
non-coniparable data. 9 
non-r^fMpnlur, 9 

urnission of important factor, 7 8 
poorlv designed experiment, 12 
unrepr(*seiitative data. 10- 1 1 
Independent variable (see Variable) 

Index numbers' 
agfcr.rcKatJve: 

price («rc aho AiJtcre^ial ive price index 
numbers) . 405-41 1 
Quantitv (sec also e^ati\ e quan- 

tit\ index luiiiiber.s; , 421 423 
average of relatives, 
price. 414-420 
(luantitv , 423 -424 
bases, 404 405 

behavior of rt'lativa^s, 39S 400 
chain, 431 432 
< tiaiiging weights, 431 13<* 
f ompanson of formula-*, 420 421 
cont r.iste<l with relative, 390 397 * 
data for, 401 40i 
defimtam of. 394 ■ 
cies'Tipt joiin ol • 

^pnc'iltural Markiding .Service Index**s 
fjf Pri('(‘s Paid bv and lieceived bv 
Farmers aiirl ParU>' H itio. 439 441 
A. T and T. Index of Industrial 
Aetivitv, 441 

Bureau of Labur Stati-it ics . Consumer 
Piice Index, 430 43H, WhoU-'^.ilc 
Cornmofiitv’ Prn o.s, 43S 439 
Business Aetivitv in Pittsbuigh, 415 
410 

Farm llou.sing in Oklahoma, 440- 
447 

Federal Ko, serve Index (4 Industrial 
Prfnluction, 442 441 
V Times W'e* kly Imlex of Busi- 
ncs^ Ac tiv it\ , Ml 145 
S. E. C’. Index of (’ornmfm Stock 
Price., 441 -412 
m.itheiJLitir al Icsl.s, 420 428 
price, 4f).5 t2l 
problems, 397 39S 
quant It \ . 42 1 424 

sijh-il il utirii; .adding, or droyjpinp coin- 
moditu'.,. \ 130 

use.i of. 391 390 

weighting ..rdiemes. 408-4J3. 415-417 


Iiidu'^trial activity, index of, 444 
Imlustrial production, index tjf, 442-144 
Inference, statistical (,^^c Signifir'ance 
te.sts; Confidem (* hmlt.s; 

Inspection trend, 202. 318, 341-347 
Irregular variations: 

computation of, 3S2 384 
curv^“. uf. 3S3 3S4 
explained, 251 
sinootluiig ot, 3S0 382 

J 

.lahofli, \I.oic I. In 
Jessop, W N . U»0u 

K 

Kan a. A. J , 031 (>32 
Karpiiios. B 1) . r>92 
Kollcv , T L . 553n 

Kendall, M. tr., 4cs()ii, ,594n. f(38ri riolu 

KeviU'.s, J. M , lOSn. 411, 429 

Key pulu h {st c fat'*! pumlo 

King, W. L, 194. 42SJ1 

KolTskv, N. M.. 439n 

Kill tosis: 

giai>hio illuslratmn- of, 213, 233 234 
measure of, 232 230 
.significance ted, 722 
Kurtz, E H.. 230 

L 

L: 

de.scrapt e »n, 7fl5 
table ot V due. o! . Tt)!! 
u.sed to coijiijare several variaiue.s. 704 
700 

La<‘e\ , () L., 12ii 
Lug, inea-urement of, 57t) 
u.e in forecasting, 5S'2 585 
Laspevres, I'b, 409 
Latsr ha, K , b87n 
Latter, O. H., 707 
Lrcid, rneasurernrmt of, 579 5S.) 

Use in foref asting, 5S2 o'sO 
Le.i.r square.. 2ti5 270, 790 799 
Leptt)kuiU( distribution 2l/i, 232 235, 
3S2, 722 

Lettering o| i tnirt.s, M 
l>ev, J , 30n, (i2t)n, tiiiSn. 720u 
Ia*wus, It. E., 352n 
Lewis, T., 753 

Ltkeliiiood, crilenon of (.sec L) 

Linder, F E . 1 Mn. 243 
lank relate-, ^ 339 

Liter-irv' Migest sampling rnetimd of, 

10 11. 33 

Logurithimc chart, gud. or papei 

logarithmic hor;/f>nt.i] s- aU*. OM, (>i '# 
logarithmic liori/onl.d and vertical 
sr*alc s. 505 

logaiitlimic v«*rncal s< ale, 9S 110, 215, 
292, 291, 300, 504 
semi-lngarithmn: chart, 98 lJ<i 
Logarithmic normal r urve, fitting ol, 0l3- 
019 

Logaritlimic probability papi*r, (>I5 
Logaiithms, cornmun; 
explanation, 770 
table of. 777 791 
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Logistic curve, .SI 0-316 

as "law” of poptilation growth, 31 n 31 d 
comparison with (^ompertz, 31 (> 31!^ 
first dilTcrenco of. 317 
fitcing of, 

bv method of .‘-^'Icrted points. 310 .3ir» 
by u.se of i cMpi ocal.^,, 310 
properties of. .310 
series of, 313 310 
skewcil, 310 
Long cvcles, 253 
Lowciistcm, 13 , 120 

M 

MacDonald, 455 
Madow. \V Ci . 2Srj, 30 n 
Mahaliinohis. P C' , 703 
Map Statistical map) 

Marshall, \ . *110 
Marshall-rdgcvvorth formula, 410 
Martin, J. H , .523 
Mat hmiial K al proofs, 702 'Sl7 
M at her, K.. 721 

Maximum variation cliart, S5 -S7 
McMillan U V . 146 

Mean Anthmetic mean, Cleometnc 

mc.in , Harmonic mean, tjuadratie 
mean ) 

M(,. ’ 'M'latjon, 21 .5 

Mean ,s<niaie rontmgenev, coellicient (O’, 
4S1-4S2 
Median 

defimtion of, IS 5 
grapliic loeation • 

fre(jn(*m’y <'urv’^f, 193 
ogive. 1.S7* ISS 
griniped data,, ISO JS7 
'mgrouped ilata. 1S5 ISO 
iiM-' in index fnimf^er.'.. 420 
use in seasonal, 325 
Mercington, Maxine, 7.50, 7,59 
Mesokurtic di.sti ibutions, 213, 2.32, 234. i»t)7 
Xfiller, A. H , 7l0n 
Miner, .1. K , .553n 

Minor mean', i c (ieometrn imMii , 
ITaMiioiiK’ me.in , Duadiatic me.in» 
Misuses {'in’ Irn pt (*priet les) 

W C. 2.50 3S9n, 392n 

Mode 

betas lin'd in I'oinputation of. 1 9()n 
detiiiitioM of, ISO 
grafiliic locaPoM 

eolumii tliagram, 191 
frt'fiuencv ( iirv'e, 191, 193 
ogive, 191 

grouped d.'ita, P.H) 192 
ungroupod data, 1S9 190 
Modifieii exponenti.il curve’ 

charts of charm teri'.tu shapes, 299 
* fitting of, 29S-302 

formulas foi constants. 302, 799 SOO 
properties of, 29S 299 
Modified mean' 
forms of. 1<S2 1H3 

use in computing sca.sonal index, 322 - 
323. 33.5 3,i9 
Mfidloy, H , 126 
Moments' 

correction of for grouping error, ‘237 239 
when applicable, 2.is. Ii2\n 
first moment, 229, 237 


Moments (cont.): 

fourth moment, 232-236. 237-239 
seeonfi moment, 231, 237 -239 
third moment, 229 232, 237 
Mood, A. M., f)4f)n, 7i9n 
Moore, Ci IT , 3S9n. 399 
Mfisteller, K.. 31 n 
Moiizoii, Vj. D . Jr., 577n 
Moving averages 

irregular movements, smoothing of, .3h0 
3S2 

seasonal index, Used in computing, 32S 
334 

Moving seasitual, 3'49 3.51 
Mudgett, B 1> , 42^0 
Multiple-axis i h.art. 90 (.see aln(' Year- 
ov'er-\ ear chart ) 

Multiple eorrci'itinn 

and explinm'd variation, 5.34, 513 .54S 
eoidhrient deiued from simple and par- 
tial eiiolficieiil s, .550, .57)ti 
cootfo lent derived from simple coeiTi- 
cienfs, .5tftn, .5.55 .556 
curv'd) near, .5.59 otiO 
effect of additional variables on, .5.34 
effect <d intereorrel'itinn .1 on, 5‘l5'5il> 
estimnlmg equations (see I stimating 
equations) 

four or more independent variable-^, ,519 
.5.51 

importance of indivulu.il indepemlent' 
van.'ibles, 5.57 559 
m varialdes, .519 .551 
meaning of, .5 H .5,M 
multip'le- p.irtial, .5.51 

net (oefticjeids of estimation, 5.3.‘i, ,5'42, 
547, .5.5f), 5.57 
non-line.u , 5.59- 560 

norma! eqiiation-> (.sc/' Noimal etjuatioiii 
in correlation) 

popul dion (•'.timat i' of coefln lent s. 73 t 
rfg:ird<*/l as simple i orrc'kition, .551 
sigmlieaiiee tests of i uellicieiit 732 731 
stand'ird erroi .. uf e.t incite f.s/ c .Stand- 
ai (1 ei rf>i of e-t imat c > 
tliree ' fl,‘perideii1 van ibles, 54i) -.5t9 
tune as i indcjieiideiit variable, 57. t 57.5 
two in*l. I erident variable'', .511 513 
Multiple di (ermin.ition, cooIIk lent of (set 
Deter nnnal lou, coeffiennit otj 
Multi-' ‘^age samph*. 29 

N 

N'air, K. H.. 31 1 ri 
N’a\er, I* P 763 
Vet balance cliart S.5 
Vet corre! tiori (see I’.irtial corr elation) 
Xewhall, S. M . 216 
X. Y. l^irnos W eeklv Iridt'v of Busi;u*ss 
Act jv itv', 444 -445 
Neuman, J., 7l)ln , 

Non-determimP ion, coefficient of, 4i>.i 
Non-hnear correlation • 

logarithms used, 593 594, 59.S 512. .51S- 
519 

means used. 520 524 
multiple, 559 5ti0 

population estimate of coefficient, 7‘29 
739, 732 

reciprocals Used. 595 .596. .519 520 
st'cond-degree curve used, 486- 491 
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Non-Unonr correlation (coni .) ; 
fiiKiiifiranro teats of eoeffioienta* 726-732 
flQuare roots usQ<i, 504-505, 513-*S16 
ihird-degroe curve used, 493 -498 
Normal, in time scries, 380 
N<^rii)al curve of error (sre Normal curve) 
Normal curve or distribution (see also 
l^oRurithinic normal curve): 
and binomial. 591-594 
and significance tests, 62fV-642, 663- 667, 
G70 671. 673-075, 679 680, 681-683. 
684- 686, 724 -725. 735 
development from laws of chance, 59I> 
594 

fitting of: 

areas, 509 603, 603-606 
ordinates, .596 599 
formula for, 594. 595 
historical development of. 590 591 
relation to chi-square, f, and F distribu- 
tions, 720-721 
table of ordinates, 744 -745 
tables oJ areaa. 746, 748, 749 
testing suitability of. 606-607, 690-691 
Normal eouationa explained, 265 -270, 798- 
799 

Normal equations in correlation: 
multiple correlation. 

four or more independent variables. 

. .550 

three independent variables 546-547 
two independent vanubles. 542 
nou-linoar correlation 

secemd-degree curve, 486 489 
straiglit line to logunthm-'., 508. 518 
straight lino to reciprocals, 519 
straight line to square roots, 513 514 
tbird-dogree curve, 494-497 
two-vtiriab!e linear correlation: 
grouped data, 475n 
ungrouped data, 4.56-457. 467, 491, 

.539 

Normal oiiimtions in time series: 
second-degree curve, 285 -2S6 
second-degree curve to logarithms, 295 
struiglit line, 270-272 
“traigbl bne to logarithms, 292 
third-degree curve, 288 
Normal probability curve (flee Normal 
’curve) 

Null liypothcRis, 637 
not prov'en or disproven, 637 

O 

Observation equations. 267 270 
C>givo. i6S 170, 187 18S, 191. 006 
Orthogonal polynomials, 289 290 

P 

Paasche, H., 410 
Pans, J. n., 394n, 420a 
Parity index, 395, 439-441 
Parity ratio. 440-441 
Parkes, A. 8,, CIO 
Parten, Mildred B., 13n 
F^artial correlation. 

and explained variation. 534-536, 643- 
545, 549 

and net coefriaiont of estimation, 635 


Partial correlation (eont .) : 

coefficient derived from lower-order 
coefficients, 552-555 

first-order coefficients, 543-545, 552-554 
four or more independent variables, 650- 
651 

meaning of, 634-536 
population ostirnale of coefficient, 736 
regarded as simple correlation, 651 
second-order coefficients, 549, 554 
significance teats of coefficients, 734-735 
third or higher-order coefficients, 555 
three independent variables, 649, 554 
time as an independent variable, 573- 
.575 

two independent variablori, 543-546, 662- 
55*1 

used in two-variablo non-linear correla- 
tion, 493n 

Partial determination, coefficient of (see 
Determination, coefficient of) 

Pearl, R.. 311n. 315 

Pearl-Reed curve (see also Logistic curve) 
310 316 

Pearson, E. S., G51n, 67H 679, fiSSn, 704n, 
721, 741. 745, 747, 7.53, 759, 761, 706 
Pearson, K., 227, 451, 590n, 622, 74!. 745, 
747 

Percentage frequency distribution, 166-168 
Percentages (see alno Proportions; Kates; 
Ratios) . 

averaging of, 151 152, 183- 184, 086 
faulty use of, 149 -152 
hundred per cent 'itatenient, 147 - 148 
rounding to total 106 per rent, 61-62, 
140 

significance teats, 661 680 
Percentile meaHure of: 
dispersion, 214 
akew’nc‘is. 229 
Percentiles, 187 189 
Period data, 71-73 
Periodic curve, 388 

Periodic movements (see also Beasonal 
moveriieritH, Seasonal indexes), 
explained, 246-249 

intra-year indexes (see Seasonal indexes) 
tyjx’s of, 246, 249 
Peters, C. C., 237n. 482n 
Physical volume oT busineas activity. 

indexes of, 442 446 
Pietographs, 123 126 
Pie diagrams, 126-130 

Pittsburgh business activity, index of, 44.5- 
446 

Platykurtic distributions, 213, 2v32, 2.34- 
236, 722 

Playfair, W., 68u 
Point data, 71 73 

Poisbon distribution, 594n , 

Polynomial senes: 

as estimating equation in correlation: 
second degree, 486-40 1 
straight line, 455-458, 491-493 
straight line to logarithms, 503 504, 
608-512, 518-519 

straight lino to reciprocals, 605-50(\ 
619-520 

straight line to square roots, 604-605, 
613-516 

third degree, 493-498 
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Polynomial series (cont.) : 
as trend in time senes: 
fifth doKreo, 282-283 
fourth degree, 282-283 
second degree, 282-283, 286-288 
second degree to logarithms, 295- 297 
strmglit line. 203-275 
straight line to logarithms, 290-294 
third degree, 282 283, 288-289 
orthogonal, 289 290 

Population, estimate of (see also Confidence 
limits) . 

coefticient.s of determinatjon . 
multiple, 734 
non-lmear, 729-730, 732 
partial, VOifi 

two-variable linear, 725-726 
correlation coefficient (see coefficients of 
determination, above) 
proportion, 680 

standard deviation, 644, 81(V 812 
V an ancp , 644 , 810-812 
Population changes, adjustment for, 2.59- 
257 

Powers of natural (and odd natiunP nuiii- 
• - bers, sums of, 740- 743 
Precision, measure of, 221 -222 
Prefatory notes in fables, 61 
Prescott. R. B., 243n, 309n 
Pr " ''Station of tlivta (see l>Ria. statistical. 
pi^.oicntntion of) 

"Prkc changes, adjustment for, 257-258, 3t)4 
Price itidex numbers (ace Aggregative price 
index nuinbors; Index uiirnber.s) 
Price relatives: 

heliavior of, 398 400 

contrasted with index numliern, 39fi .39' 

dofmitior of, 414 

used to construct index numbers, 4 14 1 17 
Prices paid by and received by fanners, in- 
dexe.s of, 439 -441 
Primary source, 15-49 
Probability paper: 
an Ui mo tic, 607 
logarithmic, 015 
l^roofa, mathematical, 7‘>‘2 S17 
Propiutions, chart, SI S3 
TVotnictor, percentage, 127, 129 
launch card, 42, 44 
Purposive saniplo, 31 

Q 

Quadrants, for curve plotting, 68, 70 

Quadratic mean. 209 

Qualitative distribution.^. I'on elation of, 

480 482, 524 

Quality, control of. 30, 643 
Quantity index numbers \sc€ Aggregative 
• ciuantity index numbers; Index 

num bers^ 

Quantity relatives, used to construct index 
numbers, 423-424 
Quartih; deviation, 215 
Quartilo measure of: 
dispersion, 215 
skewness, 229 
Quartilcs, IST-ISO 
Questionnaire, 16 
Quintiles, 187-189 
Quota sample, 31 


R 

RandalbC. K.. 439n 
Random point sample, 31 
Random sample, 26-27, 626 
Range, 214 
Range charts, 87 

Ranked data, correlation of, 478 480 
Rates; 
birth 144 
death, 143-144 
use of term, 130n 

Ratio chart (sec Semi-log.'inthniic chart) 
Ratio of deterimruitinii (sepjare of (orrcla- 
tion ratio). 522 

Ratios (see aLw Percentages; Proportions, 
Rates) 
av'ernging: 

arithmetically, 151-152, 183 184, 680 
arithmetic- versu-s geometric mean, 200 
201, 41S-420 
calculation of, 136-138 
effect of changing base, 138 139 
faulty use of percentages, 149-162 
illustrations of use, 141-149 
recording percentages, 139-140 
type, of, 140-141 
Reciprocals, table of, 766- 775 
Reed. L. J.. 315 

Refcrenre-cycle analyiia, 388-392 
Reference table, 53 
Registration, 16 

Reliability (are Significance te-ite) 

Kepioduction of cliarts, 81 

Rtsscarch methods, 12-14 

Reverse J curve. 1 04 

Rietz, H. L., 61 3n 

Roniig, 11. G., 66*>n, (>68n 

Ross. F. A,. 743 

Ross, J. E., 447n 

Hounding, 139-140, S18-.>22 

Rugg, II. O.. 482n, 745. 746 

Ruling: 

curves, 80-81 
tables, 64 

S 

Sample ( also Significance tests): 
as used by or in: 

American Institute of Public Opinion, 
32 

Ceijsus of Manufar turing, 25-26 
index numbers, 402~4u4 
Literary Digest, 19-11, 33 
bias in, 31, 33-34 
test of stability misleading, 33 
types of sarnple.s: 
arci . 29 
cluster, 28-29 
haphazard, 32-33 
multi-stage, 29 
purposive, 31 
quota, 31 
random, 26-27 
random point, 31 
sequential, 30-31 
stratified, 29--30 
systematic, 27-28 

Sample values, tests of (see Significance 
tests) 

Sasuly. M.. 388n 
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Scale labels, 84 

Scatter, zones of («ce Standard error of 
OrititnatLO , 

Scatter diagram, 451-462, 466-467 

Scatter ratio, 512 

Scliedule: 

editing, 36-37 
illustrations. 19-21 
making, IS- 25 
meaning of term, 16 
tabulating from. 37 -45 
use of, 34*36 

Sehumiu’ber. b. X.. 457, 4SS 
Score sheet. 40 

Seasonal indexes (acr abso Seasonal move- 
ment si ; 

amplitude adj«,dnK;rit. 300 361 
changing 310 .h'il 
combination types, 362 
constant 

link relative, Persons, or llar\aril, 339 
per rent of moving average. 326-339 
per (‘ont of trend, or I'alknei, 324 320 
ooiitinuitv of, 36l-3ri2 
Easter adjust inent, 352 359 
logic al ha'^is, 302-3(>3 
inovniig, 34i>’351 
stable (ser ronstant, abort') 
sudden chfiiiges lu, 359 
te.sts of. 339. 372 

• timing, short time &bilt> m. 350 36() 
Seas(jnal movements («<>’ n/se Periodic 
movemeut'''; 
adjustment for 

bv' dlVl^loIl. 367-371 
by Mibti action, 372 373 
nature of, 24ti 2 IS 
reasons lo»' interest in, 248 249 
types of (are also Seasonal indexes;, 240 
248 

Seasonal variation (see .Sea>onaI nmvo- 
rnent^j 

Secondarv '■oun'e, 45-49 
Seeondarv trend, 253 

Second-degree curv’c (sec P(»KmjrriKd '-^Tiea) 
Second-orrler partial correlation ( o<«ibt lent.s, 
540. 5.V4-555 
Seeri->1, II., 4S2n 
Secular tremd Trend) 

S. E. C Index of Common .'^toc k Prices, 
441-442 

Selected piurits, for fitting htgisLu nirve, 
31(1-315 

Serni-irilerciuartile range, 215 
Semi-logarnhmic chart istr aho Loga- 
nthrnif chart)- 
applif ation^ of. 195- 113 
coii>-t riH'tion ol s« ale, 199 192 113 116 
cycles, KKl-lUl, 113 

expansioti and contraction <»f -.talc, 113- 
114 

explairjed, 98 192 
interpretation of, 193 195 
[iriiK iplcb of construction, 190 102, 113, 
116 

p^Jrpo^e of, 93, 98 
Semi-tabular presentatiori, 50-61 
Heuuenti.d sampling, 30 31 
Sheppard's corrections. 237 239, 621n 
Shewliart, W. A . 23Sn, 619n. 62in, 627- 
631, 702. 747 


Significance: 

and value of P, 638-641 
criterion of, 640- 641 
level of, G35 

Significance ratio, 637, 645 
.'^Significance testa (see also Conhdenoe 
limits) : 

analysis of variance {see Analysis of 
variance) 

, 'chi-square, 681-693, 699-702 
errors in, 639 
F (see F) 

likelihood, criterion of (see L) 
of dilTorenro between observed and com- 
puted frequenr ies, 679-680, 684-693 
of difference betwt'en observed and 
population froquencioa, 661 6i9, 
ivSl -684 

of difTerenee between .sample and popu- 
lation v:iluo.> 

arithmetic means, 635-050 # 

Vietas, 720 722 

coefficients of determination, 722 724, 
72:)~736 

(‘orrelation coefficients. 722 724, 725- 
736 

proportions, 661-679, 681 G84 
"tandard deviations, 699-702 
variances, 609-702 

o( difTerence between tw'O .sample values, 
arithmetic rneana, independent *■ 4111 m 
pies, 651 -054 

arithmetic, means, non-independent 
sarnpleh, 654-657 

coefficients of determination, 724 725 
correlation coefficients, 724-725 
proportions, 679 680, 684 693 
standard deviations, 702-704 
varuiiu'es, 702 704, 7()(i-720 
of seveiiil viirmnees, 704 706 
of '’lope of linear estimating equation. 723 
one tail ver‘«us two tails, 037 638 
t (see f ) 

variance, analvsi', of (ace Analy.si.s of 
vanam c) 

z (see 7. transformation) 

Significant digit-,, SIS S 22 
Sdlioiiette ctiurt, S 5 - S 6 

kSirnple correlation (scr Two-variubh; linear 
corrclcitmn) 
wSmc-co.sme curva*. 388 
r'kevved ( urve 

fitting of by use of logarithms. G13 619 
fitting of normal curve with adjustment 
for skewne.ss, 019- 623 
'SkewnosM 

absolute ver-iiis relative, 227 
^meaning of, 225 

(harts, 212. 220 • 

iJH'a.surea of lelative: 

Pcarsonuin, 227 229 
using perigMitiles, 229 
using quartile,-., 229 
U'^ing third moment, 229 232 
.signihcanee test, 720-722 
Smalley, C, W., 249n 
Small-number metlioda, 667 
Smith, J. G., 64 In 
Snedeeor, G. W., 160n 
Soloraone, L. M., 228n 
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Sorter, electric {see Electronic statistical 
maciiine) 

Source note; 
of chart, 85 
of table, 61 
Sources of data, 

comparability of, 48-49 
primary, 45 
fiocondary, 45 46 
flelocted li&t of. 823 829 
suitability of, 46-47 
Spear, Mary E.. 76n 

Spearman rank correlation cocfrieient, 478- 
4S0 

Specific-cycle analysis, 392 
Spillman, W. 3 ., 49(i 
Square roots, tabic of, 766 775 
Squares, table of, 766 -775 
Stamp, Sir J., Ion, 150n 
Stanberry, Van B., 1 12n 
Standard deviation. 

and area under normal curve, 219 221, 
746. 748, 749 

correlation when in terni." of, l(i5n. 571 
573 

■ grouped data, 217 220 
of population, 217, 633 
of population, cfr^timated value, 217, 644. 
651 652 

*■ ainplo, 217. 644 
, jjroperlics of, 219 222 
ungroviped data, 215-217 
used in comparing cyclical rnoveinents, 
384 387 
Standard error: 

of aritbmeric niean, 633, 807 8J0 
of ditTc'reiico between two arithmetic 
means, 651, S12 813, S14 
of dilToience between two proportions, 
680 

of proportion, 665, 814 -815 
of 2. 724, 735 

Standard error, estimated: 
of arithmetic mean, 644 
of differencn between two arithmetic 
means, (>52 

of difTcronfC beiwcL'ii tw'o proport. c. oa, 
l)8U 

StandareJ error cd estimate: 
multiple cc;rrelati(m 

effect of additional variables on, 543, 
548 

four or moie independent variables, 550 
three independent v.’iruibles, 5 IS 
two independent variables, 534, 543 
noh-lmear corrolatirm 

eerond-degree curve, 490 
straight line to h^ganthm'-. 511 512, 
519 

straight line to reciprocals, 520 
straight hue to .square roots, 515 
third-degr» e curve, 498 
two-variablc linear correlation: 
grouped data, 477 
ungrouped data, 454, 45S 46 j -^ 68, 
492, 540 

Standard scores, 224 225, 571 
Statistical data {see Data, statistical) 
Statistical differences versus generic differ- 
ences, 657-'058 

Statistical inference {see Significance testa) 


Statistical map: 
dot, 131 133 
hatched, 131 
pin. 132- 134 

StalLstical method, 1-2, 12 
Statistical reports. t.ahles in, 65-66 
Statistical tabl(‘s (xfc 3'ables, statistieiil) 
Stalistics 

definition of, 1 
origin of, 2 
Stall her, B K., 439n 
Stein, fl , 1 17n 
Stvncils for lettering, 84 
Stewart. Leonora, 596 
Stone, U. E , t47n 
Straight- line trend* 

equation explained, 203 265 
ieH.''t-squ.irt‘s fit' 

adapting eipiation to monthly data, 
2 : 5-278 

even number of > cars. 273 275 
fitted to logantlinis 29ft 294 
normal c(|uations, 2f)7 270, 798 799 
ob-^ervation equatums, l'(i7 270 
odd number of \oiu<, 270 273 
rea-on for use of, 265 270 
Strut n“d .s.unplc, 29 30 
Str\ ker K. E . 1 26n 
Stuart, \ , 69. hi 
Student (VV. (' (lossetb 751 
Summar\ table, 53 54 ' 

Sums of powers of natural numbers, 740- 

74 1 

Sums of power.s ol odd natural numbers, 

742 743 

Systematic sample, 27 -28 

T 

f- 

, find significaiif e test for arithmetic means, 
(‘»45. fJ53, 656 

and '-.igiiificauce te.'^t for (“orrelation ( Oeffi- 
cients. 722 723. 727. 729, 734 735 
and .significance tost foi .-lojic ol linear 
estimating equation, 72.3 
cur’ . - ol, 6 16 
dirtt’iruLion of, 646 

relat. ui to normal, chi-stiuare, and F dis- 
tributions, 721) 72 1 
t *bl(* of value's of, 750 751 
3'ables, slatistu ill 

arrangement of entries, 66 60 

compari.->ons, 54 55 

emplia*-is, v56 

footnotes, 61 

guiding tin* eye, 65 

pe»-'*ntuges use of, 61 62 

prefatory notes, 61 

repro<lu<-ti(m of, in reports, 66 

rounding numbers, 1)2 ()3 

ruling. 64-65 

size and shape, 03 64 

source notes, 61 

title and nientification, 60 

totals, 6.1 

Ivpe, size and style, 65 
t\ pes of, 53 
tviiewntten, 05-66 
units, 63 

Tabular pre.sentation (seel'ables. atatistical) 
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Tabulation: 

hand sorting, 30 

mechanical, 39^-45 * 

score or tally sheet. 37-39 
Tabulator, electric (see lOlcotronic statistical 
machine) 

Tally sheet, 40 
TchebychelT’s inequality, 221 
Text table, 53-54 

Xhird^degreo curve (see Polyiiomiul series) 
Thompson, Cathorino M., 753. 750 
Thorson, G., 595 

Time element in correlation (see Time series 
correlation) 

Time reversal test, 427 
Time series: 

correlation of (eec Time scries correlation) 
movements in: 

cyclical, 249- 251, 373-382, 384-387 
irregular, 251, 3S2- 384 
long cycles, 253 

periodic, 246-249. Ch. 14, Ch. 16 
trend, secondary, 253 
trend, secular. 240-240, Ch. 12, Ch. 13 
plotting of. 71- 73 

preliminary treatment of data, 2.5.3 259 
Time series correlation (see also Lag) : 

adjusting for trend and seasonal by use of 
rvelical relatives, 578 -5S5 
adjusting for trend by Uao of; 

* absolute deviations fioiri trend, 575 
firs* difFerenccs, 57.5 -.570 
per' entage diffci cnees, 575-576 
percentages of Ironrl, 503 673 
equivalence of use of alisohite deviations 
and partial correlation, 575 570 
problems involved, 57r* -578 
unadjusted data. 502-563 
use <jf inuUiph'! and partial correlation. 
673- 575 

Tippett, L. H. C-, 038n, 7l0n 
Title: 

Ilf chart, 84-86 
of taVde. GO 
Totid-> in table, 03 
Total variation in: 

analysis of variance, 708-700, 712, 714 
multiple correlation; 

four or iiioro independent variables, .550 
three independent variables, 547 
two independent variables, 534, 543 
non-hnear correlation; 

correlation ratio, 521-622 
second-degree curve, 490 
straight hno to logarithms, 500-610, 
518 

straight line to reciprocals, 519 520 
straight line to square roots, 514 
third-degree curve, 497 
two-variable linear correlation, 462 404, 
467, 491. 539-540 

Trend . 

adjustment for, 305 -367, 378 
empirical test of data. 318-319 
explained, 240-240 
fitting of: 

asymptotic growth curvc.s, 297-318 
Gompertz, 302-310 
inspection trend, 202, 318 
logistic, 310 310 
modifiori exponential, 298 302 
polynomials (see I*olynoinial series) 


Trend (cont,): 
inter-cycle, 389 
intra-cyclo, 3S9 
nature of, 240-246 
secondary, 253 

secular, 240 -246, Ch. 12, Ch. 13 
selection of period. 278-280 
selection of type, 318-319 
Tukey, M. VV.. 31 
Two-variahlo linear correlation: 

coefTicicnt of correlation and slope of 
estimating equation, 465 466 
coelfieicnt of (leterinin.'ition : 

and explained variation. 401 464 
and proportions of common factors, 
404ri 

concepts, 451-455 
estimating equation, 455 458 
gnnipcd data, 473 478 
normal o(iuations, 455 457 
populatum estimate of coefficients, 725- 
726 

prodiK't-momcnt'fonniila, 465 466 
qualitative data, 4S() 482 
^ raiikoil data, 478 4S0 
rcsult'i ronqiarcd with: 

multiple correlation, 543, 548 . 

non-lincar correlation, 491 493 
partial correlation, 545, 553 554 
scatter diagram, 151 4.52, 4G6-407 
significance test.s, 722-726 
standard error ot estimate, 458 461 
ungiouped data, 4uJ'' 409 » 

Type I anr! Tyj)e 11 errors. 639. 719n 
Typewriter, use of 
in chart lettering, 84 
in taldo con.struciion, 05-66 

IJ 

I nbiasetl estimate (see Pupulatioo 
estimate) 

Unexplaimvl variation in: 
multiple eorrelat.ion; 

four or more independent variables, 
550 

three independent variables, 547 
tw<» independent variables 534, 543 
non-linoar correlation: 
heforiii-degree curve, 490 
straight line to logarithms, 511, 519 
straight line to reciprocals, 520 
straight hno to square roots, 514 
third -degree curve, 497 
two-varja))!e linear cf)rrolation, 461-402, 
468, 492. 539-540 
nits, how shown in table, 63 
iS. Bureau of Labor Statist ies indexes: 
consumer prices, 394, 436 -438 
wholesale commodity piico.s, 398 -400, 
438-439 

V 

Van Voorhis. W. R., 237n, 482ii 
Variable: 

continuous and discrete, 161 
independent and dependent. 452, 532 
Variance: 

yinalysi.s of (see Analysis of variance) 
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Varwnc'p (rnnt.): 

of ponolrUion. 217, 634 
of I'siitniitcd from: 

column moiiiis, 709^ 710. 713- 714, 718~ 
720 

mtcrartinri, 7 IS 719 
intoraction and variation \\ithin boxca, 
7U)--72() 
one R;jrnpl^\ ti44 

re.sidual vanation, 713 714 | 

row means, 7)3 714, 718 720 | 

several 05 In 

two sani]>le.i 0.51 052 j 

vanatj m within boxes or ceils, 7 IS 719 i 
within columns, 709 -710 
of aumpie, 2f6 ^ . 

Vananci^of population, ostiinwtwd (fire ! 

Variance! | 

Variation; i 

additive nature of, 4til-402, 709 j 

and coofTicients of detcrnihuition bsce j 

ICxplained varmlion) 

between column moans, 7(n> 70, s, 7i2, ; 

715, 815-S1<; ; 

betwet'n row nu'ans 712 -713, 715 i 

* 'cooHiciont of, 222-223 | 

■*"due 'co interaction, 718 j 

explained (sec Explained variation) j 

residual, 713 j 

»'.l Total variation) | 

uritxpuiiiiod (flct* l.'noxplainod variatiMn) i 
witljin boxes or coils, 7 1.5 -71? 
within columns, 7<)S 709, HiO 
VarymK horizontal-scale charts, 89 -90 
VerhuLsI, I\ b , 315 
Vij^noc, A, J., 732n 


w 

Wald, A,, 30n 

Walker, Helen M., 30ii, tiSn, 690n, (>2(*n, 
03Sn, 720n 

Washburn, ll. S., 523 

Weekly Index of Rusineas Activity , N. Y. 

Times, 41 1* 445 
Weld, LI), 590 
West, Helen. 590 

Wliolosnle f \>injriodit V Tncc'^, linlex of, 

141 112 .<9S 100. i.iS 439 
V/ilcv. N. C'.. 045 
Wm^rev. H , 230 
Wincrield, A H., 472 
Winston, Ellen, 447n 
Winston, fi , 103, 220. 22R 
Working. H , 207n 

Working': days, flexible calendar of, 738-739 

y 

Vatoa. IE, 27n, 2S9 m, 750. 7.53, 759 
Vat^'s’ f orrectjnii, iWUi 007. 071. 089 
Year-, '/or-yeai chart, 90, 3o9, 372 
Yule, G lb. 4S0n. 594 n 

z 

/. . iinrt, 87, 89 

/cm on veitical ^<’ale of charts, 70-80 
/er(>-f»rd('r coelT)'' icjO'' 515 
2 transionnntion, 72.3 ;25, 736 



